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Abstract 


Two  procedures  for  decoding  linear  systematic  codes,  majority  decoding  and  a  pos¬ 
teriori  probability  decoding,  are  formulated.  The  essential  feature  of  both  methods  is 
a  linear  transformation  of  the  parity -check  equations  of  the  code  into  "orthogonal  parity 
checks."  The  decoding  decisions  are  then  made  on  the  basis  of  the  values  assumed  by 
these  orthogonal  parity  checks.  For  binary  codes,  the  principal  component  required 
in  the  circuitry  for  instrumenting  these  decoding  rules  is  an  ordinary  threshold  logical 
element.  For  this  reason,  we  refer  to  these,  decoding  rules  as  forms  of  "threshold 
decoding. " 

It  is  shown  that  threshold  decoding  can  be  applied  effectively  to  convolutional  codes 
up  to  approximately  100  transmitted  bits  in  length  over  an  interesting  range  of  rates. 
Very  simple  decoding  circuits  are  presented  for  such  codes.  However,  it  is  also  shown 
that  the  probability  of  error  at  the  receiver  cannot  be  made  as  small  as  desired  by 
increasing  the  length  of  the  code  that  is  used  with  threshold  decoding,  rather  this  prob¬ 
ability  approaches  a  nonzero  limit  as  the  code  length  is  increased  indefinitely.  A  large 
number  of  specific  convolutional  codes,  suitable  for  threshold  decoding,  are  tabulated. 
Some  of  these  codes  are  obtained  by  hand  construction  and  others  by  analytical  tech¬ 
niques. 

It  is  shown  that  threshold  decoding  is  applicable  to  certain  low -rate  block  codes, 
and  that  a  generalization  of  the  method  is  applicable  to  several  other  classes  of  block 
codes.  It  is  shown  that  simple  decoding  circuits  can  be  used  for  such  codes.  The  theo¬ 
retical  limits  of  threshold  decoding  with  block  codes  are  still  not  clear,  but  the  results 
presented  here  are  promising. 
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GLOSSARY 


Symbol  Definition 

An  orthogonal  parity  check  on  some  noise  digit,  say  em 

Ik  Sum  of  the  received  digits,  rm  excluded,  checked  by  A^ 

D  Delay  operator 

G^(D)  Code-generating  polynomial  in  a  convolutional  code 

J  Number  of  orthogonal  parity  checks 

k  Number  of  information  symbols  in  a  code  word 

kQ  Number  of  information  symbols  per  time  unit  in  a  convolutional  code 

L  Number  of  steps  required  to  orthogonalize  a  block  code 

m  Degree  of  code-generating  polynomials 

n  Number  of  symbols  in  a  code  word 

nQ  Number  of  symbols  per  time  unit  in  a  convolutional  code 

n^  Actual  constraint  length  of  a  convolutional  code 

nE  Effective  constraint  length  of  a  convolutional  code 

m  Number  of  symbols  checked  by  Ik 

PQ  Pr[em=l],  where  em  is  the  bit  to  be  determined  by  decoding 

pQ  Transition  probability  of  a  binary  symmetric  channel 

p^  Probability  of  an  odd  number  of  errors  in  the  bits  checked  by  Ik 

p  Erasure  probability  of  a  binary  erasure  channel 

Pjte)  Error  probability  in  determining  set  of  first  information  symbols  of  a 

convolutional  code 
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Symbol 
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T 

wi 

(?) 

UJ 
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{X,} 

{f(D)} 


Definition 

Number  of  elements  in  a  finite  field  or  GF(q) 

Threshold 

Weighting  factor 

Binomial  coefficient 

Greatest  integer  equal  to  or  less  than  I 

Least  integer  equal  to  or  greater  than  I 

Set  of  all  elements  X..  where  the  index  i  runs  over  all  elements  in  the 
set 

Polynomial  obtained  from  f(D)  by  dropping  all  terms  with  power  of  D 
greater  than  m. 
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I.  THE  CONCEPT  OF  THRESHOLD  DECODING 


In  1948,  Shannon  first  demonstrated  that  errors  in  the  data  transmitted  over  a  noisy 
channel  can  be  reduced  to  any  desired  level  without  sacrificing  the  data  rate. 1  His 
"Noisy  Coding  Theorem"  stated  that  such  a  channel  is  characterized  by  a  quantity  C, 
called  the  channel  capacity,  so  that  the  error  probability  at  the  receiver  can  be  made 
arbitrarily  small  by  proper  encoding  and  decoding  of  the  data  when,  and  only  when,  the 
rate  R  of  information  transmission  is  less  than  C.  With  the  goals  and  limitations  of 
their  art  thus  clearly  delineated,  communication  engineers  have  since  focused  consid¬ 
erable,  and  often  ingenious,  effort  on  the  dual  problems  that  stand  in  the  way  of  full 
exploitation  of  Shannon's  theorem,  along  the  following  lines: 

(i)  the  construction  of  good  codes  that  are  readily  instrumented,  and 
(ii)  the  development  of  simple  and  efficient  decoding  apparatus  for  such  codes. 

The  first  of  these  problems  constitutes,  by  itself,  no  real  obstacle.  Let  us  for  con¬ 
venience  confine  our  discussion  at  the  moment  to  binary  codes  and  transmission  through 
a  memoryless  binary  symmetric  channel.  A  mathematical  model  of  such  a  channel  is 
Shown  in  Fig.  1.  When  either  a  "one"  or  a  "zero"  is  transmitted  over  this  channel,  it 
is  received  correctly  with  probability  qQ  and  incorrectly  with  probability  p  =  1  -  qQ. 

The  channel  capacity  can  be  shown  to  be  1  -  H(p  )  bits  per  transmitted  symbol,  where 

0  2 

H(x)  =  -x  log2  x  -  (1-x)  log2  (1-x)  is  the  entropy  function.  Each  message  to  be  trans¬ 
mitted  over  this  channel  will  be  encoded  into  a  block  of  n  binary  digits.  The  number 

nR 

of  allowable  messages  is  2  where  R  is  the  rate  of  information  transmission  in  bits 
per  transmitted  symbol  when  the  input  messages  are  all  equiprobable. 

In  proving  his  "noisy  coding  theorem"  for  the  binary  symmetric  channel,  Shannon 
showed  that  when  R  is  less  than  C,  the  average  error  probability  vanishes  with 
increasing  n  for  the  ensemble  of  binary  block  codes  in  which  each  of  the  2n^  message 
sequences  is  chosen  at  random  with  equal  probability  from  the  set  of  2n  binary  sequences 
of  length  n.  Since  a  code  taken  at  random  from  such  an  ensemble  has  probability  less 
than  l/N  of  having  error  probability  more  than  N  times  the  ensemble  average,  one  can 
quite  reasonably  select  a  good  code  at  random.  The  encoding  apparatus  would,  however, 
be  prohibitively  complex  since 
to  be  stored. 


Fig. 


each  of  the  2  code  words,  of  n  bits  each,  would  have 


1.  Binary  symmetric  channel. 
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Elias  has  shown  that  the  much  smaller  ensemble  of  sliding  parity -check  codes  has 

the  same  average  probability  of  error  as  the  ensemble  when  all  the  code  words  are 

selected  at  random  and,  furthermore,  that  this  error  probability  decreases  exponentially 

4 

with  the  block  length  n  with  the  optimum  exponent  for  rates  near  C.  Fano  has  shown 
that  a  linear  sequential  network  containing  only  n  -  1  stages  of  shift  register  can  serve 

5 

as  an  encoder  for  a  sliding  parity-check  code.  Recently,  Wozencraft  has  found  an  even 
smaller  ensemble  of  codes,  which  we  shall  call  the  randomly  shifted  codes,  that  again 
have  the  same  ensemble  average  probability  of  error  as  the  other  random  codes. 
Wozencraft' s  codes  can  be  encoded  by  a  linear  sequential  network  containing  the  max¬ 
imum  of  Rn  or  (l-R)n  stages  of  shift  register  (cf.  section  2.5).  From  either  the 
ensemble  of  sliding  parity -check  codes  or  the  ensemble  of  randomly  shifted  codes,  it 
is  thus  quite  feasible  to  select  a  good,  and  readily  instrumented,  code  by  a  random 
choice.  However,  at  present,  no  way  is  known  to  solve  the  second  coding  problem  for 
these  ensembles,  that  is,  the  construction  of  a  simple  and  efficient  decoder.  (The  com¬ 
plications  of  the  general  decoding  problem  will  be  discussed  below.) 

Remarkably,  efficient  decoding  procedures  have  been  found  for  certain  ensembles  of 
randomly  selected  codes.  The  sequential  decoding  technique  of  Wozencraft  was  devised 
for  the  ensemble  of  convolutional  codes.  ^  Wozencraft  and  Reiffen  have  demonstrated 
that  for  this  method  an  exponential  decrease  in  error  probability  can  be  attained  by  a 

dec  ode  i'  whose  complexity  and  average  number  of  computations  increase  by  only  a  small 

7 

power  of  the  code  length.  Gallager  has  found  a  similar  result  to  apply  to  his  iterative 

procedure  for  decoding  the  ensemble  of  parity-check  codes  that  are  constrained  to  have 

8 

a  low  density  of  "ones"  in  the  parity-check  matrix.  For  both  of  these  schemes,  the 
decoding  effort  is  small  only  when  the  information  transmission  rate  is  less  than  some 

"computational  rate"  which  is  less  than  channel  capacity.  These  two  decoding  procedures 

9  10 

(and  the  modified  forms  of  sequential  decoding  recently  proposed  by  Ziv  and  Fano  ) 
are  the  only  known  general  solutions  to  the  dual  coding  problems  described  above.  Both 
of  these  methods  have  the  disadvantages  that  the  amount  of  computation  per  decoded 
digit  is  a  random  variable,  and,  for  short  codes,  the  decoders  are  more  complex  than 
those  that  can  be  constructed  for  certain  specific  codes  by  using  other  techniques. 

A  second  approach  to  the  coding  problems  stands  in  sharp  contrast  to  the  probabi¬ 
listic  approach  taken  by  Wozencraft  and  Gallager.  Based  on  the  pioneering  work  of 
11  12 

Hamming  and  Slepian,  the  algebraists  take  as  their  starting  point  a  formal  math¬ 
ematical  structure  that  is  demonstrated  to  have  metric  properties  that  are  desirable 

for  coding  purposes.  Often  the  decoding  methods  are  found  after  the  codes  have  been 

13 

known  for  some  time,  as  is  illustrated  most  strikingly  by  the  Reed  algorithm  for  the 

14  1 5  16 

Muller  codes,  and  the  Peterson*  algorithm  for  the  Bose-Chaudhuri  codes.  Because 

of  the  systematic  structure  of  the  codes  obtained  by  the  algebraic  approach,  the  decoding 

procedures  usually  require  a  fixed  amount  of  computation.  Very  likely,  for  the  same 

reason,  the  codes  of  any  one  known  type  always  become  poor  as  the  code  length  is 

increased  with  the  rate  held  constant.  (The  single  exception  to  this  statement  is  the 


2 


17 

iterated  coding  of  Elias  which  does  succeed  in  making  the  error  probability  vanish, 
but  more  slowly  than  the  optimum  exponential  rate  with  block  length,  and  only  for  rates 
that  are  substantially  less  than  channel  capacity.) 


1.  1  THRESHOLD  DECODING  OF  LINEAR  CODES 


Against  this  background,  we  can  describe  how  the  concept  of  threshold  decoding  which 
will  be  introduced  in  this  report  is  related  to  the  previous  approaches  to  the  coding  prob¬ 
lems.  As  in  the  probabilistic  approach,  we  shall  begin  with  a  decoding  algorithm  rather 
than  with  specific  codes.  However,  our  algorithm  will  be  primarily  algebraic  and  will 
require  a  fixed  amount  of  computation  per  decoded  symbol.  Because  of  the  special  alge¬ 
braic  code  structure  which  it  requires,  the  algorithm  will  not  be  applicable  to  whole 
ensembles  of  codes;  rather,  specific  codes  to  which  it  applies  will  have  to  be  con¬ 
structed.  Finally,  and  most  important,  we  choose  as  our  algorithm  a  procedure  that 
lends  itself  to  simple  machine  implementation.  Before  outlining  the  decoding  procedure, 
we  shall  present  some  preliminary  remarks  on  linear,  or  group,  coding. 

1 8 

a.  Linear  Codes  and  the  Decoding  Problem 

We  shall  be  concerned  primarily  with  linear  codes  in  systematic  form.  The  set  of 
code  words  for  such  a  code  is  a  subset  of  the  set  of  n-tuples  of  the  form 

(tj,t2,  .  . .  ,tn),  (1) 


where  each  t.  is  an  element  of  GF(q),  the  finite  field  of  q  elements  (cf.  Appendix  A). 
The  symbols  t  j,  t2> . . . ,  t^  are  chosen  to  be  the  information  symbols.  The  remaining 
n-k  symbols  are  called  the  parity  symbols  and  are  determined  from  the  information 
symbols  by  a  set  of  linear  equations 


i=l 


c..t. 
Ji  i 


j  =  k+1,  k+2, . .  .  ,  n 


(2) 


where  the  set  of  coefficients,  c^,  are  elements  of  GF(q)  specified  by  a  particular  code. 
(All  operations  on  these  symbols  are  to  be  performed  in  GF(q)  unless  expressly  stated 
otherwise.) 

We  assume  now  that  after  transmission  through  some  channel,  a  received  n-tuple 

(r . ,  r,, . . . ,  r  )  is  obtained  which  differs  from  the  transmitted  n-tuple  (t. ,  t_,  . . . ,  t  )  by 
ic.il  i  n 

a  noise  sequence  (e^,  e2>  ...»  en),  that  is 

ri  =  *i  +  ei  i  =  1,  2, . . . ,  n,  (3) 

where  again  r^  and  e ^  are  elements  of  GF(q),  and  all  arithmetic  operations  are  carried 
out  in  this  field.  It  then  follows  from  Eq.  3  th^t  knowledge  of  the  received  sequence 
and  the  noise  sequence  suffices  to  determine  the  transmitted  sequence. 

It  can  be  readily  verified  that  the  set  of  transmitted  code  words  form  an  additive 


3 


Abelian  group  of  members  (cf.  Appendix  A  for  the  defining  axioms  of  such  a  group). 
There  are  code  words,  since  each  of  the  k  information  symbols  can  have  any  one  of 
q  values.  Let  (tj,  t2>  . . .  ,  tn)  and  (t*,  t*,  ....  t*)  be  any  two  code  words.  Then,  by 

tj+tj,  t2+t2,  ....  t  +t  where 


t. 

3 


+  t. 
3 


j  =  k+1,  k+2, . . .  ,  n 


(4) 


and  this  is  the  same  n-tuple  as  is  obtained  by  encoding  b+b  ,  i  =  1,  2,  ,  k  as  informa¬ 

tion  symbols.  Thus  the  first  group  axiom  is  satisfied,  and  the  other  axioms  are  satis¬ 
fied  trivially. 

We  shall  always  use  the  notation,  an  (n,  k)  code,  to  mean  a  linear  systematic  code 
as  defined  by  Eq.  2. 

Equation  2  may  be  rewritten  as 


)  c..t.  -  t.  =  0 
L,  31  1  3 


j  =  k+1,  k+2, . . .  ,  n 


(5) 


and  we  say  that  each  of  these  n-k  equations  defines  a  parity  set  for  the  code,  that  is, 
some  weighted  sum  of  code  symbols  which  is  zero  for  all  code  words. 

We  shall  define  a  parity  check  to  be  the  sum  in  Eq.  4  formed  at  the  receiver,  that 
is, 


i=l 


c  ..r, 

31  1 


r . 
3 


3  =  k+1,  k+2, .  .  .  ,  n. 


(6) 


Using  Eqs.  3  and  5,  we  may  rewrite  the  parity  checks  as 


=  )  c..e. 
L  31  1 


i=l 


e. 

3 


j  =  k+1,  k+2, .  .  .  ,  n 


(7) 


from  which  we  see  that  the  {s^}  constitute  a  set  of  n-k  linear  equations  in  the  n  unknowns 
{e^}.  (We  shall  use  the  notation  {a.J  to  mean  the  set  of  objects  a^,  where  i  ranges  over 
all  indices  for  which  the  objects  are  defined.)  The  general  solution  of  Eq.  7  can  be  writ¬ 
ten  immediately  as 


e. 

3 


)  c  ... 

Lj  31  1 

i=l 


j  =  k+1,  k+2, .  .  .  ,  n. 


(8) 


This  general  solution  has  k  arbitrary  constants,  namely  the  values  of  e^,  e2>  .  .  .  ,  e^. 

Each  of  these  arbitrary  constants  can  be  assigned  any  of  q  values,  and  thus  there  are 
k 

q  distinct  solutions  of  Eq.  7. 
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We  have  not  yet  considered  the  mechanism  by  which  the  noise  symbols  {e..}  are  gen¬ 
erated.  The  {e^}  can  be  considered  as  sample  points  from  the  random  process  described 
by  the  communication  channel.  For  instance,  it  is  readily  seen  that  the  binary  sym¬ 
metric  channel  in  Fig.  1  is  fully  equivalent  to  the  model  in  Fig.  2  for  which  the  noise 


Fig.  2.  Noise-source  model  of  the  Binary  Symmetric 

Channel, 
r. 

MOD  -  2 
ADDER 

source  is  an  independent  letter  source  that  has  probability  pQ  of  giving  a  "one"  output, 

and  probability  q  of  giving  a  "zero"  output.  In  this  case,  a  noise  sequence  of  n  bits 

w  n-w 

that  contains  w  "ones"  has  a  probability  of  occurrence  of  Poqo  and  this  is  a  mono- 

tonically  decreasing  function  of  w,  the  number  of  errors  in  the  received  set  of  n  bits 

/  <  1\ 

^we  assume  pQ  =  —J  . 

The  general  decoding  problem  for  linear  codes  is  to  find  that  solution  of  Eq.  7  that 
is  most  probable  from  consideration  of  the  channel.  For  example,  when  binary  data 
are  transmitted  over  a  binary  symmetric  channel,  the  problem  is  to  find  that  solution 
of  Eq.  7  that  contains  the  smallest  number  of  "ones."  In  practice,  it  is  generally  not 
feasible  to  find  the  most  probable  solution  of  Eq.  7  for  an  arbitrary  parity -check  pattern 
{s^},  simply  because  of  the  enormous  number  of  possibilities.  An  efficient  solution  of 
the  decoding  problem  depends  on  finding  a  simple  method  for  determining  the  most  prob¬ 
able  solution  of  Eq.  7  for  a  high  probability  subset  of  the  set  of  all  possible  parity  check 
patterns. 

b.  Orthogonal  Parity  Checks 

We  shall  now  consider  a  procedure  by  which  the  parity  checks  {s^}  of  Eq.  6  can  be 
transformed  into  another  set  of  quantities  that  will  be  found  more  convenient  for  decoding 
purposes.  We  begin  by  considering  linear  combinations  of  the  {s^}.  We  define  a  com¬ 
posite  parity  check,  A ^  to  be  a  linear  combination  of  the  {s^}.  That  is,  each  A.  is  given 
by  an  equation 

n 

A.=  £  b..s.,  <9J 

j=k+l 


NOISE 

SOURCE 


e- 


where  the  coefficients,  b^,  are  again  elements  of  GF(q). 
From  Eqs.  7  and  9,  we  have 
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(10) 


3=k+l 


b.. 


l 

1=  1 


3h  h 


which  may  be  written 


Ai  =  I  W 

3=1 

where 


(11) 


a 

/E  bihchj 


3  =  1.2 . k 


h=k+l 


aij=^ 


(12) 


b.^  3  =  k+l,k+2,  . .  ,,n. 

It  is  now  convenient  to  make  the  following  definition, 

DEFINITION:  A  set  of  J  composite  parity  checks,  {A^},  is  said  to  be  orthogonal  on 

if  in  Eq.  11 
m 

aim=1  1=1.2,  ...,J  (13) 

and 


a.. 

13 


=  0 


for  all,  but  at  most  one,  index  3 
different  from  m  for  any  fixed  i. 


(14) 


In  other  words,  a  set  of  J  composite  parity  checks  is  called  orthogonal  on  em  if  em 
is  checked  by  each  member  of  the  set,  but  no  other  noise  digit  is  checked  by  more  than 
one  member  of  the  set.  Thus  em  is  able  to  affect  all  of  the  equations  in  the  set,  but  no 
other  noise  bit  can  affect  more  than  a  single  equation  in  the  set. 


c.  Majority  Decoding 

We  shall  now  give  the  first  of  two  algorithms  that  can  be  used  to  determine  em  from 
a  set  of  J  parity  checks  {A.}  orthogonal  on  em-  We  shall  use  the  notation  LIJ  and  Til  to 
mean  the  greatest  integer  equal  to  or  less  than  I  and  the  least  integer  equal  to  or  greater 
than  I  respectively,  and  we  shall  say  that  a  noise  bit  is  checked  by  some  parity  check 
if  and  only  if  it  appears  with  a  nonzero  coefficient  in  the  equation  for  that  parity  check. 

THEOREM  1:  Provided  that  [j/2j  or  fewer  of  the  {e^}  that  are  checked  by  a  set  of 
J  parity  checks  {A^}  orthogonal  on  em  are  nonzero  (that  is,  there  are  |_J/2J  or  fewer 
errors  in  the  corresponding  received  symbols),  then  e  is  given  correctly  as  that  value 
of  GF(q)  which  is  assumed  by  the  greatest  fraction  of  the  {A^}.  (Assume  e  =  0  in  the 
case  for  which  no  value  is  assumed  by  a  strict  plurality  of  the  {A^},  and  0  is  one  of  the 
several  values  with  most  occurrences.) 
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PROOF  1:  Suppose  em  =  V  and  assume  initially  that  all  other  that  are  cheeked 
have  value  0,  Then  from  Eqs.  11  and  13  it  follows  that 

A.  =  V  i=  1,  2, . .  . ,  J.  (15) 

Now  suppose  V  =  0  and  the  conditions  of  the  theorem  are  satisfied.  The  [J/2j  nonzero 
noise  digits  can  change  at  most  the  same  number  of  the  equations  in  Eq.  15  and  hence 
at  least  \j/z]  of  the  {A^}  are  still  zero.  Thus  zero  is  either  the  value  with  most  occur¬ 
rences  in  the  set  {A^}  or  one  of  the  two  values  with  the  same  largest  number  of  occur¬ 
rences;  in  both  cases  the  decoding  rule  of  the  theorem  gives  the  correct  value  of  e  . 

Conversely,  suppose  that  V  ^  0.  Then  since  =  V  t  0,  it  follows  that  fewer  than 
jj/2j  of  the  other  noise  digits  checked  by  the  {A.}  are  nonzero,  and  hence  that  more  than 
[j/2]  of  the  equations  in  Eq.  15  are  still  correct.  Hence  the  decoding  rule  of  the  theorem 
is  again  correct,  and  the  theorem  is  proved. 

If  J  =  2T  is  an  even  number,  it  follows  from  Theorem  1  that  effl  can  be  correctly 
determined  whenever  T  or  fewer  of  the  received  symbols  are  in  error.  Similarly  if 
J  =  2T  +  1  is  an  odd  number,  then  again  em  can  be  found  whenever  T  or  fewer  errors 
occur,  and  in  addition  T  +  1  errors  can  be  detected  by  saying  that  a  detectable  error 
has  occurred  when  the  value  assumed  by  the  greatest  fraction  of  the  {aJ-  is  nonzero  and 
has  exactly  T  +  1  occurrences.  These  considerations  imply  that  any  two  code  words 

with  different  values  of  t  must  be  at  least  distance  J  +  1  apart.  (We  shall  use  distance 

m 

to  mean  always  Hamming  distance,  that  is,  the  number  of  symbols  in  which  the  code 
words  differ.)  Hence,  Theorem  1  has  the  following  corollary. 

COROLLARY:  Given  a  linear  code  for  which  it  is  possible  to  form  a  set  of  J  parity 
checks  orthogonal  on  effl,  then  any  two  code  words  for  which  tm  differs  are  separated 
by  a  Hamming  distance  of  at  least  J  +  1. 

We  shall  refer  to  decoding  performed  according  to  the  algorithm  given  in  Theorem  1 
as  majority  decoding  of  orthogonal  parity  checks.  It  should  be  clear  from  the  preceding 
discussion  that  majority  decoding  is  a  form  of  minimum  distance  decoding;  that  is,  when 
decoding  e  ,  majority  decoding  assigns  the  value  to  em  that  it  takes  on  in  the  noise 
sequence  of  minimum  weight  that  satisfies  Eqs.  11.  It  is  important  to  note  that  this 
noise  sequence  does  not  in  general  coincide  with  the  noise  sequence  of  minimum  weight 
that  satisfies  Eqs.  7,  since  the  mapping  from  the  {s^}  to  the  {A^}  need  not  be  one-to-one. 

d.  A  Posteriori  Probability  Decoding 

Majority  decoding  is  inefficient  in  the  sense  that  it  does  not  take  into  account  the 
details  of  the  channel  statistics,  that  is,  of  the  probability  distributions  of  the  noise  bits. 
We  shall  now  give  a  decoding  algorithm  that  makes  the  best  possible  use  of  the  informa¬ 
tion  contained  in  a  set  of  J  parity  checks  orthogonal  on  em  in  arriving  at  a  decision  on 

the  value  of  e  . 

m 

We  assume  the  noise  sequence  is  additive  and  is  independent  from  digit -to -digit, 
but  is  not  otherwise  restricted.  This  means  that  the  channel  must  be  able  to  be  put  in 
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the  form  of  the  model  in  Fig.  2  in  which  the  noise  source  is  an  independent  letter  source 
and  the  adder  operates  in  GF(q). 

Taking  average  probability  of  error  as  the  criterion  of  goodness,  we  seek  a  decoding 
algorithm  that  will  assign  to  effl  that  value  V  for  which  the  conditional  probability 

Pr(em=v|{A.})  (16) 


is  a  maximum.  Using  Bayes1  rule,  we  have 

Pr({A}|e  =V)Pr(e  =V) 
Pr(em=v|{A.})  =  -  1  m 


(17) 


Pr({A.}) 

Because  of  the  orthogonality  on  effl  of  the  {A,}  and  the  digit -to -digit  independence  of 


the  noise  sequence,  it  follows  that 


Pr(Wlem=V)=  n  Pr(A.|em=V). 
i=l 


(18) 


Substituting  Eqs.  17  and  18  in  Eq.  16  and  taking  logarithms,  we  can  phrase  the  decoding 
rule  as  follows:  Choose  em  to  be  that  value  V  for  which 

J 

log  [Pr(em=V)]  +  ^  log  [Pr(A.|em=V)]  (19) 

i=l 


is  a  maximum.  For  emphasis,  we  state  this  result  as  a  theorem. 

THEOREM  2:  Given  a  set  {A^}  of  J  parity  checks  orthogonal  on  e^,  and  that  the 
noise  sequence  is  additive  with  digit -to -digit  independence,  then  the  decoding  rule  based 
on  {A.}  which  determines  em  with  the  least  average  probability  of  error  is:  Choose  em 
to  be  that  value  V  of  GF(q)  for  which 

log  [Pr(em=V)]  +  Y  log  [Pr(Ai|em=V)] 
i=l 


is  a  maximum. 

We  shall  refer  to  decoding  performed  according  to  the  algorithm  of  Theorem  2  as 
a  posteriori  probability  decoding  of  orthogonal  parity  checks,  or,  more  simply,  as  APP 
decoding. 


e.  Threshold  Decoding 

Let  us  now  consider  the  specialization  of  the  majority  decoding  and  APP  decoding 
algorithms  to  the  binary  case.  Let  pQ  =  1  -  qQ  be  the  error  probability  for  bit  e  ,  that 
is, 

pr(em=l)  =  Pc-  (20) 
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If  =  1  —  denotes  the  probability  of  an  odd  number  of  "ones"  among  the  noise  bits 

^-P  ^  4-U  ~  - J  1 _  A  j.1 _ *th _ _ i _ »_ - - 1  H 


^  ^  XT- 

exclusive  of  that  are  checked  by  A.,  the  i  parity  check  orthogonal  on  em>  then  it 
follows  that 

Pr(A.=0  I  e  =  1)  =  Pr(A.=  l  |e  =0)  =  p. 
l  1  m  '  x  1  m  7  *i 

and 

Pr(A.  =  l  |  e  =1)  =  Pr(A.=0|e  =0)  =  q.. 

'  i  1  m  '  '  l  1  m  '  7l 

Since  em  can  have  only  two  possible  values,  0  or  1,  it  follows  that  the  APP  decoding 
rule  is.  simply:  Choose  em  =  1  if,  and  only  if, 


(21) 


(22) 


.J  J 

log  (pq)  +  2,  loS  [pr(A.|em=n]  >  log  (qQ)  +  Y,  log  [pr(Ai I em=0l]' 
i=l  i=l 


(23) 


Using  Eqs.  21  and  22,  we  can  reduce  (23)  to 
J. 

2,  (2A.-1)  log  (q^)  >  log  (qQ/p0) 


(24) 


i=l 


or 


2,  A.[2  log  (q^/p.)]  >  ^  log  (qi/pi),  (25) 

i=l  i=0 

where  the  {A^}  are  treated  as  real  numbers  in  (24)  and  (25).  We  summarize  these  results 
in  the  next  theorem. 

THEOREM  3:  For  a  binary  memoryless  channel  with  additive  noise,  the  APP 
decoding  rule  becomes:  Choose  em  =  1  if,  and  only  if,  the  sum  of  the  members  of  the 
set  {A.}  of  J  parity  checks  orthogonal  on  e  ,  treated  as  real  numbers  and  weighted  by 
the  factor  2  log  (q./p^),  exceeds  the  threshold  value 

J 


I 

i=0 


log  (q/pj). 


where  p  =  1  -  q  =  Pr(e  =1)  and  p.  =  1  -  q.  is  the  probability  of  an  odd  number  of  errors 
o  o  m  l  l 

in  the  symbols,  exclusive  of  e  ,  that  are  checked  by  the  l  orthogonal  parity  check. 

In  a  similar  way,  Theorem  1  for  the  binary  case  reduces  to  the  following  theorem. 
THEOREM  4:  Given  a  set  {A^}  of  J  parity  checks  orthogonal  on  e  ,  then  the  major¬ 
ity  decoding  rule  is:  Choose  em  =  1  if,  and  only  if,  the  sum  of  the  A^  (as  real  numbers) 
exceeds  the  threshold  value  [l/2  j). 

Because  of  the  similarity  of  the  decoding  rules  of  Theorems  3  and  4,  and  because 
these  decoding  rules  can  be  instrumented  by  means  of  a  simple  threshold  logical 
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element,  we  use  the  generic  term,  threshold  decoding,  to  describe  either  majority- 
decoding  or  APP  decoding  of  orthogonal  parity  checks.  For  convenience,  we  shall  use 
the  same  nomenclature  for  nonbinary  decoding. 

1.2  SUMMARY 

By  considering  only  orthogonal  parity  checks  rather  than  the  entire  set  of  ordinary 
parity  checks,  the  general  decoding  problem  described  in  section  1.1a  can  be  reduced 
to  the  simple  forms  given  by  Theorems  1-4.  The  reasons  for  this  simplification  are: 

In  general,  there  is  a  very  complex  relationship  between  the  values  of  the  s^  in  Eq.  6 
and  the  corresponding  most  probable  noise  sequence  e.,  i  =  1,  2,  ...,  n.  However,  if 
the  Sj  are  linearly  transformed  into  a  set  of  parity  checks  orthogonal  on  effl,  there  is 
a  very  simple  relationship  between  the  set  {A^}  of  orthogonal  parity  checks  and  e^  — 
the  A^  conditioned  on  em  are  a  set  of  independent  random  variables.  Thus  the  factor¬ 
ization  of  Eq.  18  can  be  carried  out,  and  this  permits  each  of  the  orthogonal  parity 
checks  to  be  treated  separately  in  the  process  of  computing  the  most  probable  value  of 

e  . 
m 

It  remains  to  be  shown  that  there  exist  codes  for  which  the  mapping  from  the  ordinary 
parity  checks  to  the  set  of  orthogonal  parity  checks  can  be  carried  out  in  an  efficient 
manner,  that  is,  that  the  most  probable  value  of  em  obtained  by  one  of  the  threshold 
decoding  algorithms  will  coincide  with  the  most  probable  value  of  e  with  respect  to  the 
entire  set  of  ordinary  parity  checks  for  a  high  probability  subset  of  possible  values  of 
the  ordinary  parity  checks.  The  rest  of  this  report  will  be  devoted  to  this  task. 

In  formulating  the  concept  of  threshold  decoding,  we  have  not  restricted  the  choice 
of  a  finite  field  for  the  coded  symbols.  We  shall  hereafter  restrict  ourselves  almost 
exclusively  to  the  binary  field,  for  two  reasons:  because  this  is  a  case  of  practical 
interest,  and  because,  as  we  shall  point  out,  there  are  difficulties  in  applying  the  method 
of  threshold  decoding  to  nonbinary  codes  in  an  efficient  manner. 
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II,  CONVOLUTIONAL  ENCODING  AND  PARITY  CHECKING 


We  shall  now  give  a  brief  discussion  of  convolutional  encoding  that  will  include  alge¬ 
braic  properties  of  such  codes,  bounds  on  code  quality,  and  circuits  for  encoding  and 
parity  checking.  Although  this  section  is  intended  primarily  to  serve  as  a  background 


1 

1  1 

2  l _ ► 

l<o  0  ► 
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• 
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Fig.  3.  General  convolutional  encoder. 


for  the  following  ones,  some  of  the  material  presented  is  new.  Sections  2.2,  2.4,  and 

2.  7  represent  original  work.  The  random-coding  bound  of  section  2.  5  was  first  derived 

by  Elias  (who  introduced  convolutional  coding  at  the  same  time),  and  we  present  merely 

a  different  method  of  proof.  The  Gilbert  bound  in  section  2.  3  was  shown  to  hold  for 

binary  convolutional  codes  with  rate  l/nQ  by  Wozencraft  ;  we  give  a  proof  that  applies 

to  nonbinary  codes  and  to  codes  with  arbitrary  rate.  The  encoding  circuit  of  section  2.6a 

20 

was  first  described  by  Wozencraft  and  Reiffen,  but  the  circuit  of  section  2.6b  appears 
to  be  new. 

2.1  CONVOLUTIONAL  ENCODING 

A  general  convolutional  encoder  is  shown  in  Fig.  3.  Each  unit  of  time,  an  informa¬ 
tion  symbol  enters  each  of  the  kQ  input  lines,  and  an  encoded  symbol  leaves  each  of  the 
no  (no=kQ)  output  lines.  The  symbols  can  be  elements  in  any  finite  field,  GF(q),  and 
the  rate  R  =  kQ/no  then  has  the  units  of  logg  q  bits  per  output  symbol.  All  operations 
on  the  symbols  are  assumed  hereafter  to  be  carried  out  in  GF(q)  unless  it  is  expressly 
stated  to  the  contrary. 

2 1 

Using  the  delay  operator,  or  D -notation,  introduced  by  Huffman,  we  can  represent 
the  kQ  input  sequences  by  the  set  of  polynomials 

I^(D)  =  iQ^  +  ij^D  +  i2^D2  +  . . .  j  =  1,  2,  . .  . ,  kQ,  (26) 

where  i  ^  is  the  information  symbol  that  enters  the  j*'*1  input  line  of  the  encoder  at  time 
u.  Similarly,  the  nQ  output  sequences  are  denoted 

T®(D)  =  tjj)  +  t.^Dtt,(ilD2+...  j  =  1,2,...,  n  ,  (27) 

where  t  ^  is  the  symbol  that  leaves  the  output  line  at  time  u. 

We  assume,  without  loss  of  generality,  that  the  code  is  in  systematic  form,  that  is. 
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that  the  first  kQ  output  sequences  are  identical  to  the  input  sequences.  In  the  D -notation, 
this  property  can  be  expressed  as 

T(j)(D)  =  I(j)(D)  j»l,2 . kQ.  (28) 

The  defining  property  of  a  convolutional  code  is  that  the  remaining  nQ  -  kQ  output 
sequences,  or  parity  sequences,  be  linear  combinations  of  the  input  sequences.  That  is, 

T^(D)  =  G^(D)  I^(D)  +  H^(D)  I^(D)  +  .  .  r  +  Z^(D)  1  °\t>) 

3  =  ko+1'ko+2 . V  (29) 

Each  transmitted  parity  digit  is  thus  a  linear  combination  of  preceding  information  digits. 
The  set  of  ko(nQ-ko)  polynomials 

G(j)(D),  H^(D),  .  .  .  Z(j)(D),  j  =  kQ  +  1,  kQ  +  2 . nQ 


comprises  the  code -generating  polynomials,  and  their  choice  specifies  the  code. 

Let  m  be  the  largest  of  the  degrees  of  the  code -generating  polynomials,  that  is,  the 

largest  power  of  D  which  multiplies  any  of  the  input  sequences  in  Eq.  29.  Then  any 

particular  information  symbol  can  affect  the  output  sequences  over  a  span  of,  at  most, 

m  +  1  time  units.  During  this  time  span,  a  total  of  (m+l)nQ  symbols  leaves  the  encoder, 

and  hence  the  code  is  said  to  have  a  constraint  length,  nA,  of  (m+l)n_  symbols. 

o 

In  the  D -notation,  the  code -generating  polynomials  will  be  represented  as 


G(j)(D)  =  g0<j)  +  g^D  +  . . .  +  gm(;j)Dm,  (30) 

where  the  coefficients  are  again  elements  of  GF(q). 

We  define  an  initial  code  word  of  a  convolutional  code  to  be  the  first  set  of  n^  sym¬ 
bols  output  from  the  encoder.  We  shall  use  the  notation 

Tm(j)(D)  =t0(j)  +t1(j)D  +...  ttm^D"1  j  =  1, 2, ....  nQ  (31) 

to  represent  the  symbols  in  a  first  code  word.  Similarly,  we  shall  use  the  notation 


(j) 


(D)  = 


(j)  +  i  (i) 


D 


+  i 


(j) 


D 


j  =  1,  2,  ....  k 


(32) 


to  represent  the  transmitted  symbols  over  the  same  time  span. 

With  this  definition,  the  set  of  initial  code  words  of  a  convolutional  code  forms  a 
systematic  linear  code,  or  (n,  k)  code,  as  defined  in  section  1.1.  In  making  the  cor¬ 
respondence  with  the  notation  of  that  section,  n^  is  identified  with  n,  and  Rn^  =  (m+l)ko 
is  identified  with  k.  In  other  words,  there  are  n^  symbols  in  an  initial  code  word  and, 
of  these  symbols,  Rn^  are  information  symbols. 

These  concepts  are  best  clarified  by  giving  an  example.  Consider  the  R  =  l/2 
binary  convolutional  code  for  which  the  code-generating  polynomial  is 

G(2)(D)  =  1  +  D  +  D3.  (33) 
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For  an  arbitrary  information  sequence,  I^(D),  the  encoded  parity  sequence,  from 
Eq.  29,  is  given  by 

T(2,(D)  =  (1+D+D3)  I(1)(D)  (34) 

from  which  we  see  that  any  transmitted  parity  digit  is  the  sum  of  the  information  sym¬ 
bols  occurring  at  the  same  time  and  at  one  and  three  time  units  earlier.  Hence  if  the 
input  sequence  were 

I(1)(D)  =  1  +  D2  +  D6  +  . .  .  ,  (35) 

the  output  parity  sequence  is 

T(2)(D)  =  1  +  D  +  D2  +  D5  +  D6  +  .  .  .  .  (36) 

The  manner  in  which  the  parity  bits  are  formed  can  be  seen  from  Fig.  4.  As  the  infor¬ 
mation  bits  move  through  the  four -stage  shift  register,  the  fixed  connections  to  the  adder 
insure  that  each  formed  parity  bit  is  the  sum  of  the  current  information  bit  and  the  infor  - 


Fig.  4.  Encoder  for  the  example. 


mation  bits  one  and  three  time  units  earlier  which  are  stored  in  the  second  and  fourth 
stages  of  the  shift  register. 

.Since  the  code -generating  polynomial  has  degree  m  =  3  in  this  example,  the  initial 
code  word  is 


and  (37) 

tJ2)  =  1  +  D  +  D2. 

J  '  ‘  ■ 

From  the  preceding  example,  it  should  be  clear  that  the  initial  code  word  defined 
by  Eq.  31  can  be  obtained  from  (28)  and  (29)  simply  by  dropping  all  terms  in  D  with 
exponents  greater  than  m.  Hence,  we  can  write 


T  ^(D)  =  I  (^(D) 
m  '  '  m 


j  -  1»  2, 


,  k 


o 


(38) 


and 
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(39) 


Tm(j)(D)  =  |g(J)(d>  Ira(1><D)  +  •  •  *  +  z(j)(D)  Im(ko>(D)j 

j  =  ko+l,ko  +  2,  ...,no. 

Here  and  hereafter  we  use  the  brace  notation  to  enclose  polynomials  that  are  to  be 
expanded  under  the  constraint  that  all  terms  in  D  with  exponents  greater  than  m  are 
to  be  dropped  from  the  resulting  expressions.  (The  operation  of  Eq.  39  is  exactly  the 
same  as  specifying  that  the  polynomial  on  the  right  be  taken  to  be  the  minimum  degree 
polynomial  in  the  residue  class  containing  the  polynomial  within  braces  modulo  the  ideal 
generated  by  F(D)  =  D  .  The  terms  in  this  statement  are  defined  in  Appendix  A.) 

2.2  ALGEBRAIC  STRUCTURE 


We  shall  now  make  a  closer  investigation  of  the  set  of  initial  code  words  of  a  con¬ 
volutional  code  as  given  by  Eqs.  38  and  39.  Our  purpose  is  to  derive  certain  properties 
of  these  code  words  which  can  be  used  later  as  a  basis  for  proving  the  Gilbert  bound 
and  the  random -coding  bound  for  the  class  of  convolutional  codes.  We  begin  by  proving 
a  lemma  which  seems  quite  abstract  at  this  point,  but  whose  utility  will  later  become 
apparent. 

LEMMA  1:  Given  an  arbitrary  polynomial  I  ^(D)  of  degree  m  or  less  such  that 
iQ'J'  is  nonzero,  for  all  q  1  choices  of  G(D)  as  a  polynomial  of  degree  m  or  less,  the 
polynomial 

|g(D)  Im(j)(D)| 

is  distinct,  and  hence  takes  on  once,  and  once  only,  the  identity  of  each  of  the  qm+I 
polynomials  of  degree  m  or  less. 

❖ 

PROOF:  Suppose  there  exists  some  polynomial  G  (D)  of  degree  m  or  less  which 
is  such  that 


|G*(D)  Im(j)(D) j  =  -jG(D)  Im(j)(D)|.  (40) 

Then,  it  follows  that 

|[G*(D)-G(D)]  Im(;i)(D)j-  =  0.  (41) 


Equation  41  states  that  the  polynomial  on  the  left  can  have  no  terms  of  degree  m  or 

less.  Since  i  the  zero-degree  term  of  I  ^(B),  is  nonzero,  there  will  be  a  term 
0  #  m  % 

of  degree  m  or  less  if  G  (D)  -  G(D)  has  such  a  term.  Since  both  G  (D)  and  G(D)  are 

of  degree  m  or  less,  it  follows  from  (41)  that  G*(D)  =  G(D).  Thus  each  choice  of  G(D) 

yields  a  distinct  ^G(D)Im^(D)^  and  this  proves  the  lemma. 

We  have  seen  that  a  convolutional  code  is  specified  by  the  choice  of  (n  -k  )n  code- 

*  o  o  o 

generating  polynomials.  Since  a  polynomial  of  degree  m  or  less  can  be  chosen  in  any 
m+1  (m.H-l)(nQ-k  )k 

of  q  1  ways,  there  are  exactly  q  0  distinct  convolutional  codes  with  rate 
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R  =  k  /n  and  constraint  length  n.  =  (m+l)n  .  (We  allow  any  generating  polynomials  of 

O  O  A  O 

degree  m  or  less,  and  do  not  require  that  any  of  the  generating  polynomials  have  a  non¬ 
zero  term  in  Dm.)  The  utility  of  Lemma  1  is  that  it  permits  one  to  determine  in  exactly 
how  many  convolutional  codes  a  specified  initial  code  word  can  appear,  and  this,  in  turn, 
can  be  used  as  a  basis  for  proving  that  the  class  of  convolutional  codes  meets  the  Gilbert 
and  random -coding  bounds. 

It  is  convenient  to  use  the  following  nomenclature:  The  set  of  k  symbols  input  to 

(1)  (2)  <ko> 

the  encoder  at  time  zero,  namely  i  '  ,  i  i  ,  will  be  called  the  set  of  first 


information  symbols.  Since,  according  to  Eq.  28,  t 


U)  _  i  (j) 


for  j  =  1,  2, 


,  kQ,  the 

following  theorem  gives  the  number  of  distinct  codes  in  which  some  initial  code  word, 
with  a  specified  set  of  first  information  symbols,  appears. 

THEOREM  5:  Given  an  initial  code  word  T  ^(D),  j  =  1,  2,  ....  n  ,  for  which  at 

(i )  ™  ® 

least  one  of  the  symbols  t  ' ( j=  1 ,  2, . .  .  ,  kQ)  is  nonzero,  this  first  code  word  appears 

(m+l)(n  -k  )(k  -1) 

0  0  distinct  convolutional  codes  with  rate  R  =  k  /n  and  con- 

o  o 


in  exactly  q 
straint  length  n.  =  (m+l)n  . 

A  O 

PROOF  5:  The  assumption  that  at  least  one  of  the  t 


(j) 


j  =  1 


is  non¬ 


zero  implies  that  at  least  one  of  the  set  of  first  information  symbols  is  nonzero.  With- 
out  loss  of  generality,  we  can  assume  that  i  ^  ^  0. 

From  (38),  it  follows  that  prescribing  an  initial  code  word  fixes  the  polynomials 


ImU)<D>.  j  =  1,  2, 


k  .  Consider  the  parity  sequences  that  are  given  by  (39). 


Equation  39  can  be  written  in  the  form 


<j> 


(D) 


(D)  Im(2>(D)  +  ...  +  Z(j)(D)  Im 


<ko> 


(D)J  =  ja(;i)(D)  I(1)(D)j 


(42) 


for  j  =  kQ  +  1,  kQ  +  2, 


i(j)/ 


,  nQ.  Now  for  any  fixed  choice  of  the  polynomials  H'J,(D), 
Z^(D),  the  left-hand  side  of  this  equation  is  a  fixed  polynomial.  It  then  follows 


from  Lemma  1  that  this  equation  will  be  satisfied  for  one  and  only  one  choice  of  G 

Z 


(j) 


(D). 


kQ  +  L  kQ  +  2, 


Since  each  of  the  (ko-l)(no-ko)  polynomials  H^^(D),  ...,  Z^(D),  j 

.  . . ,  n  ,  can  be  chosen  in  any  of  qm+1  ways  as  a  polynomial  of  degree  m  or  less,  it 
°  (m+l)(k  -l)(n  -k  ) 

follows  that  the  specified  code  word  appears  in  exactly  q  distinct  codes 

and  this  proves  the  theorem. 

We  shall  now  show  how  Theorem  5  can  be  used  to  prove  the  Gilbert  bound  for  con¬ 
volutional  codes. 


2.3  THE  GILBERT  BOUND 

The  minimum  distance,  d,  of  a  convolutional  code  is  defined  to  be  the  smallest  num¬ 
ber  of  symbols  for  which  two  initial  code  words  differ  that  do  not  have  identical  sets 
of  first  information  symbols.  Since  the  set  of  initial  code  words  forms  a  linear  code 
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and  hence  forms  a  group,  it  follows  that  the  minimum  distance  is  also  equal  to  the  min¬ 
imum  weight  (that  is,  the  number  of  nonzero  symbols)  of  an  initial  code  word  that  has 
at  least  one  nonzero  first  information  symbol.  (Minimum  distance  d  does  not  imply 
that  there  exists  two  infinitely  long  transmitted  sequences,  with  different  sets  of  first 
information  symbols,  that  are  distance  d  apart;  but  rather  there  are  two  such  sequences 
whose  first  n^  symbols  are  distance  d  apart.)  We  now  prove  a  lower  bound  on  minimum 
distance  for  convolutional  codes. 

THEOREM  6:  Given  a  rate  R  =  kQ/no  and  a  constraint  length  n^  =  (m+l)nQ,  then 
there  exists  at  least  one  convolutional  code  with  minimum  distance  d,  where  d  is  the 
largest  integer  for  which 


(q-l)j 


nA(l-R) 
<q  A 


(43) 


PROOF  6:  A  convolutional  code  has  minimum  distance  at  least  d  if  it  has  no  initial 


code  word  of  weight  d  -  1  or  less  for  which  some  first  information  symbol  is  nonzero. 
Thus,  if  in  the  set  of  all  codes  of  length  n^,  we  count  the  total  number  of  initial  code 
words  with  some  nonzero  first  information  symbol  which  have  weight  d  -  1  or  less,  and 
if  this  total  is  less  than  the  number  of  codes,  then  there  must  be  at  least  one  code  with 


minimum  distance  d  or  greater. 

Since  there  are  n^  symbols  in  an  initial  code  word,  it  follows  that  when 


(number  of  nonzero\  j  maximum  number  of  distinctX 

n,  -tuples  with  j  I  codes  in  which  each  initial  \ 

...  ,  ,  .  code  word  with  at  least  one  < 

weight  d  -  1  or  less  J  I  nonzero  flrst  information  I 

/  \symbol  earn  appear  J 


\  distinct  I 

\codes  J 


f total 
'  number 
of 


(44) 


then  there  must  exist  at  least  one  code  that  has  minimum  distance  d  or  greater.  From 
the  fact  that  the  (nQ-ko)ko  polynomials  that  specify  a  distinct  convolutional  code  can  each 

be  chosen  in  qm+1  ways  as  polynomials  of  degree  m  or  less,  it  follows  there  are  exactly 
(m+l)(n  -k  )k 

q  convolutional  codes.  Thus  with  the  aid  of  Theorem  5,  (44)  can  be  writ¬ 

ten 


(m+l)(n  -k  )  nA(l-R) 

(q-UJ  <  q  =  q 


(45) 


(46) 


and  this  is  the  result  stated  in  the  theorem. 

The  bound  of  Theorem  6  is  equivalent  to  the  Gilbert  bound  on  minimum  distance  for 
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block  codes  with  the  same  rate  and  constraint  length. 

COROLLARY:  There  exists  at  least  one  binary  convolutional  code  with  rate  R  and 
constraint  length  nA  with  minimum  distance  at  least  d,  where  d  is  the  largest  integer 
for  which 

H(d/nA)  §  1  -  R,  (47) 

where  H(x)  =  -x  log2  x  -  (1-x)  log2  (1-x)  is  the  entropy  function. 

PROOF:  For  q  =  2,  (43)  reduces  to 

dV  fn a\  nA(1_R) 

Z  (  f  <  2  ■  '«> 

j=l  X  ' 
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The  summation  on  the  left  can  be  overbounded  as 


Hence  if 

n  H(d/n  )  n.(l-R) 

2  A  A  <  2  A 


(49) 


(50) 


or 

H(d/nA)  <  1  -  R, 


(51) 


then  (48)  is  satisfied,  and  there  exists  at  least  one  convolutional  code  with  minimum 
distance  d  or  greater  as  was  to  be  shown. 

Inequality  (47)  is  the  usual  assymptotic  form  of  the  Gilbert  bound  and  will  be  used 
in  Section  III  as  a  measure  of  quality  for  specific  convolutional  codes. 


2.4  AN  UPPER  BOUND  ON  MINIMUM  DISTANCE 

24 

In  a  manner  similar  to  that  used  by  Plotkin  for  block  codes,  we  can  compute  the 
average  weight  of  an  initial  code  word  for  which  at  least  one  of  the  first  set  of  informa¬ 
tion  symbols  is  nonzero,  and  then  use  the  average  weight  to  bound  the  minimum  distance. 
We  proceed  as  follows. 

Since  the  set  of  initial  code  words  of  a  convolutional  code  forms  a  linear  code,  and 
hence  forms  a  group  (cf.  sec.  1.1),  it  follows  that  each  element  of  GF(q)  appears  in  a 
given  code  position  in  exactly  l/q  of  the  code  words.  (We  assume  that  for  each  j,  at 
least  one  of  the  symbols  g  hQ^,  . . . ,  is  nonzero.  This  is  a  necessary  and 

sufficient  condition  that  none  of  the  positions  in  an  initial  code  word  will  contain  only 
"zeros"  for  all  code  words,  that  is,  there  is  no  "idle"  position.)  This  fact  can  be  veri¬ 
fied  as  follows:  Adding  any  code  word  that  has  the  element  -(3  in  the  given  position  to 
all  of  the  code  words  must  reproduce  the  original  set  of  code  words  in  some  order, 
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because  of  the  closure  property  of  a  group.  There  will  then  be  as  many  code  words  with 
zero  in  the  given  position  as  there  were  originally  with  (3  in  that  position.  Since  |3  is 
arbitrary,  it  follows  that  all  elements  of  GF(q)  must  appear  in  the  given  position  the 

RnA 

same  number  of  times  in  the  set  of  code  words.  Thus,  since  there  are  q  code  words 
in  the  set  of  initial  code  words  and  each  has  n.  positions,  it  follows  that  the  total  number 

Rn. 

of  nonzero  positions  in  the  entire  set  of  initial  code  words  is  [(q-l)/q]n^q 

The  set  of  all  initial  code  words  for  which  all  of  the  first  information  symbols  are 
zero  form  a  subgroup  because  the  sum  of  any  two  such  words  is  another  such  code  word. 
From  Eq.  39,  it  follows  that  tQ^  =  0,  j  =  1,  2,  .  . .  ,  n  ,  for  these  code  words.  By  the 
same  argument  as  that  given  above,  each  element  of  GF(q)  will  appear  in  any  given  one 
of  the  remaining  n  .  -  n  code  positions  in  exactly  l/q  of  this  set  of  initial  code  words. 

A  0  RK~n0) 

Since  this  subgroup  contains  q  members,  it  follows  that  there  is  a  total  of 

R(n.-n  ) 

[(q-l)/q(n^-n  )]  q  0  nonzero  entries  in  the  set  of  all  such  initial  code  words. 

The  average  weight  of  a  code  word  for  which  at  least  one  of  the  set  of  first  informa¬ 
tion  symbols  is  nonzero  is  given  by 


(number  of  nonzero  positions^ 
in  the  set  of  all  initial  code 
words 


(number  of  nonzero  positions \ 
in  the  set  of  all  code  words  \ 
for  which  all  first  informa-  ) 
,tion  symbols  are  "zeros"  / 


avg  /number  of  initial  code  words\' 

(number  of  initial  code  words)  -  I  for  which  all  first  informa-  \ 


ytion  symbols  are  zero 

Using  the  preceding  results,  we  can  write  this  equation 


! 


Rn 


A 


*Aq  -  SPY  (nA-no>  q 


R(nA-nQ) 


avg 


Rn , 


q  -  q 

Equation  52  can  be  reduced  to 
q-1  /  n_ 


R(nA-n0) 


avg 


nA  + 


Rn 


1 


(52) 


(53) 


Finally,  since  the  minimum  weight  code  word  with  some  nonzero  first  information 
symbol  must  have  integer  weight,  which  in  turn  must  be  no  greater  that  the  average 
weight,  we  have  the  following  result. 

THEOREM  7:  The  minimum  distance,  d,  of  a  convolutional  code  satisfies  the  ine¬ 
quality 


cl  < 


q-1 

~q~ 


(54) 
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COROLLARY:  The  minimum  distance,  d,  of  a  binary  convolutional  code  with  R  = 

l/n  satisfies 
'  o 

d  <  [i(nA+no)j  .  (55) 

The  corollary  is  obtained  by  substituting  q  =  2  and  RnQ  =  1  in  (54).  Inequality  (55) 
is  the  form  of  the  upper  bound  on  minimum  distance  which  will  be  useful  in  Section  III. 

Inequalities  (54)  and  (55)  do  not  give  good  upper  bounds  on  minimum  distance  for 
arbitrary  n^  because,  for  large  n^,  these  bounds  are  roughly  independent  of  the  rate  R. 
We  are  interested  in  these  bounds  for  the  following  reason.  In  section  3.6  we  shall  dem¬ 
onstrate  the  existence  of  a  class  of  codes  with  prescribed  values  of  n^  and  nQ  for  which 
(55)  is  satisfied  with  the  equality  sign. 

2.5  RANDOM-CODING  BOUND 

We  turn  our  attention  now  to  the  task  of  computing  what  sort  of  error  probabilities 
can  be  attained  at  the  receiver  when  data  are  transmitted  over  a  binary  symmetric  chan¬ 
nel  after  convolutional  encoding.  Rather  than  handle  the  problem  directly,  we  shall 
prove  a  more  general  random-coding  bound  that  can  be  applied  to  the  ensemble  of  con¬ 
volutional  codes,  as  well  as  to  several  other  ensembles  of  linear  codes. 

k 

Let  us  focus  our  attention  on  binary  (n, k)  codes.  For  any  such  code  the  set  of  2 
possible  information  sequences  forms  a  group,  the  group  of  all  binary  k-tuples.  We 
define  a  group  partition  to  be  a  mapping  of  this  group  of  k-tuples  into  any  subgroup  and 
its  proper  cosets  (cf.  Appendix  A).  As  an  example,  take  the  case  k  =  2.  One  possible 
group  partition  is  then 

H  =  [00,  01] 

Cj  =  [10,  11]. 

Here,  we  have  taken  the  subgroup  H  to  be  the  2-tuples  whose  first  bit  is  a  zero,  in  which 
case  there  is  only  a  single  proper  coset,  C  j,  which  is  the  set  of  2-tuples  whose  first 
bit  is  a  one. 

Next  we  observe  that  when  the  parity  sequences  are  added  to  the  information  places, 
the  resulting  n-tuples  may  still  be  assigned  to  a  subgroup  of  the  group  of  code  words  and 
its  proper  cosets,  with  the  information  sequences  being  mapped  in  the  same  manner  as 
was  done  in  the  original  group  partition.  If  we  continue  the  example  above  by  setting 
n  =  4  and  specifying  that  the  first  parity  bit  be  equal  to  the  first  information  bit  and  that 
the  second  parity  bit  be  equal  to  the  sum  of  the  two  information  bits,  then  we  obtain 

H'  =  [0000,  0101] 

C'j  =  [1011,  1110], 

The  group  partition  of  the  information  sequences  is  nothing  more  than  the  "standard 
array"  of  a  subgroup  and  its  cosets,  which  was  first  introduced  into  coding  theory  by 
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Slepian.  We  use  a  different  nomenclature  for  the  following  reason.  In  Slepian's  stand¬ 
ard  array,  the  subgroup  is  the  group  of  code  words,  and  the  parent  group  is  the  group 
of  all  n-tuples.  When  the  parity  sequences  are  attached  to  the  information  sequences 
in  a  group  partition,  the  subgroup  is  the  set  of  code  words  whose  information  sequences 
formed  the  subgroup  in  the  original  group  partition,  and  the  parent  group  is  the  group 
of  all  code  words . 

We  are  interested  in  group  partitions  for  the  following  reasons.  First,  the  set  of 
distances  between  members  of  a  proper  coset  is  the  same  as  the  set  of  weights  of  the 
members  of  the  subgroup.  This  follows  trivially  from  the  definition  of  a  coset  given  in 
Appendix  A.  Second,  the  set  of  distances  from  any  member  of  one  coset  to  the  members 
of  a  second  coset  is  the  same  as  the  set  of  weights  of  the  members  in  some  fixed  proper 
coset,  and  does  not  depend  on  the  particular  choice  of  the  member  from  the  first  coset. 
Again,  this  follows  from  the  fact,  that  the  members  in  any  coset  differ  from  each  other 
by  the  members  in  the  subgroup.  For  example,  in  the  code  used  in  the  previous  example, 
the  members  of  a  coset  are  distance  2  apart,  but  any  member  of  C'j  is  distance  3  from 
any  member  of  H',  since  all  code  words  in  the  only  proper  coset  have  weight  3. 

In  Appendix  B,  we  prove  the  following  lower  bound  on  the  error  probability  that  can 
be  achieved  in  deciding  to  which  coset  of  a  group  partition  the  transmitted  information 
sequence  belongs  when  the  code  words  are  transmitted  through  a  binary  symmetric  chan¬ 
nel. 

THEOREM.  8:  Given  an  ensemble  of  equiprobable  binary  (n,  k)  codes  and  a  group 

partition  of  the  information  sequences  with  the  property  that  an  information  sequence 

ri— k 

in  any  proper  coset  has  equal  probability  of  being  assigned  any  of  the  2  possible  par¬ 
ity  sequences,  after  transmission  through  a  binary  symmetric  channel  at  any  rate  R  = 
k/n  less  than  the  channel  capacity  C,  the  coset  to  which  the  transmitted  information 
sequence  belongs  can  be  determined  with  an  average  error  probability,  P(e),  that  sat¬ 
isfies 


P(e)  <  K  2  na,  (56) 

where  K  and  a  are  the  coefficient  and  exponent,  respectively,  in  the  usual  random¬ 
coding  bound.  (See  Appendix  B  for  precise  values.) 

The  first  error  probability  of  a  convolutional  code,  P^e),  is  defined  to  be  the  aver¬ 
age  error  probability  in  decoding  the  set  of  first  information  symbols  i  i  =  1,  2, 

.  .  . ,  k  ,  given  the  first  set  of  n  .  received  bits.  With  this  definition  we  have  the  fol- 

O  A 

lowing  corollary  as  an  immediate  consequence  of  Theorem  8, 

COROLLARY  1:  The  average  error  probability,  Pj(e),  for  the  ensemble  of  binary 

convolutional  codes  satisfies  (56),  with  replacing  n. 

PROOF:  We  have  seen  (Section  II)  that  the  set  of  initial  code  words  of  a  convolutional 

code  form  an  (n, k)  code,  where  n  =  n,  and  k  =  (m+l)k  .  Consider  now  the  following 

A  O 

group  partition:  Take  as  the  subgroup  the  set  of  all  information  sequences  for  which 
the  set  of  all  first  information  symbols  are  zero,  that  is,  i  ^  =  0,  j  =  1,  2,  .  . . ,  kQ. 
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This  subgroup  has  2  0  -  1  proper  cosets  in  which  each  proper  coset  contains  all  of  the 
information  sequences  that  have  the  same  set  of  first  information  symbols,  not  all  of 
which  are  zeros.  Given  this  group  partition,  Theorem  5  implies  that  over  the  ensemble 
of  all  convolutional  codes,  any  information  sequence  in  a  proper  coset  has  equal  prob¬ 
ability  of  being  assigned  any  possible  parity  sequence.  Thus  all  the  conditions  of  The¬ 
orem  8  are  satisfied,  and  (56)  applies  to  the  ensemble  of  binary  convolutional  codes. 

We  have  thus  established  the  fact  that  for  the  ensemble  of  convolutional  codes, 
there  must  be  at  least  one  code  for  which  the  set  of  first  information  sym¬ 
bols  can  be  determined  with  an  error  probability  that  satisfies  (56).  It  will 
be  shown  in  section  4. 1  that  decoding  for  convolutional  codes  can  be  done  sequen¬ 
tially  by  using  the  same  decoding  algorithm  to  determine  the  subsequent  sets 
of  information  symbols  as  was  used  to  determine  the  set  of  first  information 
symbols.  Thus  if  all  previous  sets  have  been  correctly  decoded,  the  next  set 
of  information  symbols  can  also  be  determined  with  an  error  probability  that 
satisfies  (56). 

Although  this  report  is  not  primarily  concerned  with  other  ensembles  of  codes,  we 
shall  apply  Theorem  8  to  Wozencraft's  ensemble  of  randomly  shifted  codes  (cf.  Section  I) 
to  illustrate  its  general  usefulness  in  proving  the  random -coding  bound  for  ensembles 
of  linear  codes. 

COROLLARY  2:  The  block  error  probability,  P(e),  for  the  ensemble  of  randomly 
shifted  codes  satisfies  (56). 

PROOF:  The  randomly  shifted  codes  can  be  described  in  the  following  manner.  Con¬ 
sider  any  maximal -length  shift  register  of  N  stages,  where  N  is  the  maximum  of  k 
and  n  -  k.  If  any  nonzero  initial  conditions  are  placed  in  this  shift  register  and  it  is 

then  shifted  some  number  of  times,  say  M,  a  distinct  nonzero  N-tuple  will  remain  in 

N 

the  shift  register  for  each  choice  of  M  =  1,  2,  .  .  .  ,  2  -  1.  Coding  for  the  randomly 

shifted  codes  is  done  as  follows:  The  k  information  bits  are  placed  in  the  k  lowest 
order  positions  of  the  shift  register  which  is  then  shifted  some  fixed  number  of  times, 

M.  The  contents  of  the  n  -  k  lowest  order  positions  of  the  shift  register  are  then 

attached  as  the  parity  sequence.  Since  the  encoding  circuit  is  linear,  the  code  is  a  true 

N 

(n,  k)  code.  There  are  2  codes  in  the  ensemble,  one  for  each  choice  of  M  =  0,  1,  .  . . , 

N 

2  -  1;  here  we  define  the  case  M  =  0  to  be  the  case  for  which  the  shift  register  contains 

only  zeros.  Now  consider  the  following  group  partition  of  the  information  sequences: 

The  all-zero  sequence  is  the  subgroup  and  each  proper  coset  contains  a  single  distinct 
nonzero  information  sequence.  Clearly  any  nonzero  information  sequence  has  equal 
probability  of  being  assigned  any  parity  sequence  if  all  choices  of  M  are  equiprobable. 
Thus  the  conditions  of  Theorem  8  are  satisfied,  and  hence  the  transmitted  information 
sequence  can  be  determined  with  an  error  probability  that  satisfies  (56).  (This  ensemble 
of  codes  is  interesting  in  that  it  contains  the  fewest  codes  of  any  ensemble  to  which  the 
random-coding  bound  has  yet  been  shown  to  apply.) 

It  is  interesting  that  the  same  basic  theorem  can  be  used  to  prove  the  random-coding 
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bound  for  the  ensemble  of  convolutional  codes  and  an  ensemble  of  block  codes.  The¬ 
orem  8  can  also  be  used  to  prove  that  the  ensemble  of  sliding  parity-check  codes  and 
the  ensemble  of  all  systematic  codes  satisfy  the  random-coding  bound.  The  proofs  i'or 
these  last  two  ensembles  is  very  similar  to  the  proof  of  Corollary  2  and  hence  will  not 
be  given  here. 

2.6  ENCODING  CIRCUITS 

We  stated  in  Section  I  that  the  first  coding  problem  is  the  construction  of  good  codes 
that  are  readily  instrumented.  The  class  of  convolutional  codes  satisfy  both  the  Gilbert 
and  random-coding  bounds,  and  hence  can  certainly  be  classified  as  a  "good"  class  of 
codes.  We  shall  show  also  that  they  satisfy  the  requirement  of  being  readily  instru¬ 
mented. 

One  of  the  principal  advantages  for  the  polynomial  approach  to  convolutional  encoding, 
which  we  have  adopted  in  this  section,  is  that  it  leads  naturally  to  the  specification  of 
encoding  circuits.  The  encoding  equations,  (28)  and  (29),  are  already  in  convenient 
delay  operator  form. 

a.  Rn^- Stage  Encoder 

A  circuit  for  carrying  out  the  operations  of  Eqs.  28  and  29  can  be  synthesized  as 

shown  in  Fig.  5.  In  this  circuit,  each  of  the  kQ  information  sequences,  I^(D),  j  =  1, 

2,  .  . .  ,  kQ,  is  fed  into  a  separate  shift -register  chain  of  m  stages,  as  well  as  being 

fed  directly  to  one  of  the  output  terminals.  The  n  -  k  parity  sequences,  T^(D), 

c  o 


22 


j  =  k  +  1,  kQ  +  2,  . . . ,  nQ,  are  formed  by  the  adders  outside  the  feedback  chain.  Since 
there  are  m  stages  in  each  of  the  kQ  shift -register  chains,  there  is  a  total  of  mkQ  = 

R(  n^-nQ)  stages,  or  approximately  Rn^  stages,  of  shift  register  in  this  encoder.  We 
shall  refer  to  this  type  of  encoder  as  an  Rn^ -stage  encoder. 

The  output  of  the  n  .R-stage  encoder  can  be  verified  to  conform  to  Eqs.  28  and  29 
in  the  following  manner:  Assume  that  i  '  -  1  and  that  all  other  input  symbols  are  zeros. 

Then  the  output  on  line  j  is  readily  checked  to  be  G^(D)  for  j  =  k  +  1,  kQ  +  2,  .  .  , ,  nQ 
and  these  are  the  polynomials  that  multiply  the  first  information  sequence  in  (29).  Sim¬ 
ilarly,  a  single  "one"  input  on  any  other  input  line  gives  as  outputs  the  polynomials  asso¬ 
ciated  with  that  input  sequence  in  (29).  The  linearity  of  the  circuit  then  guarantees  that 
the  output  will  be  correct  for  arbitrary  input  sequences. 

b.  (l-R)n^ -Stage  Encoder 

A  second  circuit  that  performs  the  operations  of  Eqs.  28  and  29  is  shown  in  Fig.  6. 
This  circuit  has  a  shift -register  chain  of  m  stages  for  each  of  the  nQ  -  kQ  parity 
sequences,  and  thus  contains  a  total  of  m(no-ko)  =  (l-R)(n^-no),  or  approximately 
(l-R)n^,  stages  of  shift  register.  We  shall  refer  to  this  circuit  as  the  (l-R)n^-stage 
encoder.  The  adders  in  this  circuit  are  placed  between  stages  of  the  shift -register 
chains,  and  hence  no  adder  has  more  than  kQ  +  1  inputs.  This  can  be  an  important 
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Fig.  6.  (l-R)n^-Stage  encoder. 
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feature  in  high-speed  digital  circuits  in  which  delay  in  the  logical  elements  becomes 
important. 

The  output  of  the  (l-R)n^-stage  encoder  can  be  verified  to  conform  to  Eqs.  28  and  29 
in  exactly  the  same  manner  as  was  described  for  the  Rn^-stage  encoder. 

c.  Serializing  of  Symbols 

The  output  symbols  of  both  the  Rn^-stage  and  the  (l-R)n^-stage  encoders  occur  in 
sets  of  nQ  symbols  each  time  unit,  one  on  each  of  the  nQ  output  lines.  When  it  is  desired 

.IZED  SYMBOL  OUTPUT 

THE  SAMPLING  SWITCHES  OPERATE 
SIMULTANEOUSLY  EVERY  TIME  UNIT. 

EACH  STAGE  OF  SHIFT  REGISTER  HAS 
DELAY  1/n  TIME  UNITS. 

'  O 


Fig.  7.  Commutating  circuit. 

to  transmit  these  symbols  over  a  single  communication  channel,  they  must  be  serialized 
to  occur  at  the  rate  of  one  symbol  every  l/no  time  units.  This  symbol  interlacing  can 
be  accomplished  by  a  commutating  circuit  such  as  the  one  shown  in  Fig.  7,  which  con¬ 
sists  principally  of  a  sampling  circuit  and  a  shift  register  of  nQ  -  1  stages. 

d.  Remarks  on  Encoding  Circuits 

The  Rn. -stage  encoder  described  above  is  essentially  the  same  as  the  canonic -form 
A  20 

encoder  developed  by  Wozencraft  and  Reiffen.  The  (l-R)n^-stage  encoder  for  con¬ 
volutional  codes  is  not  believed  to  have  been  described  previously. 

Since  Rn^  is  the  number  of  information  symbols  within  the  constraint  length  of  the 
convolutional  code  and  (l-R)n^  is  the  number  of  parity  symbols  within  the  constraint 
length,  a  convolutional  code  can  be  encoded  by  a  linear  sequential  network  containing 

a  number  of  storage  devices  (that  is,  stages  of  shift  register)  equal  to  either,  and  hence 

25 

to  the  minimum,  of  these  two  quantities.  Peterson  has  proved  this  same  result  for 
block  cyclic  codes.  Interest  in  this  parallelism  stems  from  the  fact  that,  as  shown  in 
sections  2.  3  and  2.  5,  convolutional  codes  satisfy  both  the  Gilbert  and  random-coding 
bounds,  whereas  little  is  known  about  the  behavior  of  cyclic  codes  of  arbitrary  length. 

2.7  PARITY  CHECKING 

In  section  1,  la  we  defined  the  parity  checks  of  a  linear  code  to  be 


T<’> 


t(2) 


T(no>  / 
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i  =  k+  1, k  +  2, . . . , n 


i=l 


c  ..r. 


3*  i 


r 


i 


and  we  saw  that  the  parity  checks  furnished  a  set  of  n  -  k  equations  in  the  n  noise  vari¬ 
ables  e. ,  e~,  ...»  e  .  We  call  the  process  of  forming  the  parity  checks,  given  by  Eq.  5, 
parity  checking. 

Before  illustrating  the  manner  in  which  parity  checking  can  be  performed  for  a  con¬ 
volutional  code,  we  shall  first  prove  a  more  general  statement. 

THEOREM  9:  The  parity  checks  for  an  (n,k)  code  may  be  formed  by  subtracting 
the  received  parity  symbols,  m,  j  =  k  +  1,  k  +  2,  .  .  .  ,  n,  from  the  parity  symbols 
obtained  by  encoding  the  received  information  symbols,  r.,  j  =  1,  2,  . , .  ,  k. 

PROOF  9:  According  to  Eq.  2,  encoding  of  the  received  information  symbols  will 
give  the  set  of  parity  symbols,  t'.,  where 

k  — 

'i'ZVi  j  =  k+l,k  +  2,  ...,n.  (57) 

i=l 


Subtraction  of  the  received  parity  symbols  then  yields 
k 

=  IC]iri'ri  j  =  k+l,k  +  2 . n 


t1.  -  r .  =  ;  c ..r.  -  r. 
3  3  L  31  1  3 

i=l 


(58) 


and  this  coincides  with  the  definition  of  the  parity  checks  given  by  Eq.  4,  which  proves 
the  theorem. 
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Fig.  8.  Parity  checker  for  convolutional  codes. 
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We  have  already  seen  that  the  set  of  initial  code  words  of  a  convolutional  code  forms 
an  (n,  k)  code  and  hence  Theorem  9  may  be  applied  to  such  a  code.  It  follows  from  this 
theorem  that  the  circuit  of  Fig.  8  can  be  used  to  compute  the  parity  checks  of  a  convo¬ 
lutional  code.  The  encoder  in  this  circuit  may  be  any  convolutional  encoder  such  as  the 
n^R-stage  or  the  n^(l-R)  -stage  encoders  which  have  been  described. 

The  received  sequences,  R^(D),  j  =  1,  2,  . ..,  nQ,  may  be  written  in  the  D -notation 


R 


(j) 


(D)  =  r  ^  +  r^D  +  r^D2  +  ...  j  =  1,  2, .  .  .  ,n 


(59) 


where  r.^  is  the  symbol  received  on  the  input  line  at  time  u.  With  this  notation, 

^  (i  \ 

the  noise  sequences,  E  J  (D),  j  =  1,  2,  nQ,  are  given  by 


E^(D)  =  e  ^  +  e^D  +  e^D2  +  ... 


j  =  1*2 . nn 


(60) 


where  ey^  is  the  noise  in  the  symbol  received  at  time  u  in  the  received  sequence, 
that  is, 


rU)(D)  =  T(:i)(D)  +  E(;i)(D)  j  =  1,  2 . n 


(61) 


Then  it  follows  from  Theorem  9  that  the  parity-check  sequences,  S^(D),  j  =  kQ  +  1, 
kQ  +  2,  . . . ,  nQ,  are  given  by 


S^(D) 


(k  ) 

G(:i>(D)R(1)(D)+.  .  .+Z(:i)(D)R  °  (D) 


R(;i)(D) 


i  =  k  +  1 ,  k  +2, . . .  ,  n  . 
J  0*0*  o 


(62) 


Upon  substitution  of  Eq.  29,  Eq.  62  reduces  to 


S^(D)  = 


(k  ) 


G(^(D)E(1)(D)+.  ..+Z(:i)(D)E  0  (D) 


- 


E'J'(D) 


i  =  k  +  1,  k  +2 . n  . 

J  o  o  o 


(63) 


In  the  parity -checking  circuit  of  Fig.  8,  it  is  assumed  that  the  received  symbols 
enter  in  sets  of  nQ  symbols  each  time  unit.  When  the  transmitted  symbols  have  been 
serialized  for  transmission  over  a  single  channel  (cf.  section  2.  6c),  it  is  necessary 
to  restore  the  original  parallel  timing  by  a  de  -commutating  circuit  such  as  the  one 
shown  in  Fig.  9.  This  circuit  operates  by  storing  the  received  symbols  until  nQ  such 
symbols  are  received;  at  this  time,  the  stored  symbols  are  sampled  to  form  the  nQ 
synchronous  received  sequences.  As  was  the  case  for  the  commutating  circuit  of 
Fig.  7,  this  circuit  for  restoring  the  original  timing  consists  principally  of  a  sampling 


26 


SERIALIZED 
INPUT  DIGITS 

o - 


R 


0) 


_ /. _ „  rW 

THE  SAMPLING  SWITCHES  OPERATE 
EVERY  TIME  UNIT, COMMENCING  WHEN 
THE  FIRST  RECEIVED  DIGIT  ARRIVES 

AT  OUTPUT  LINE  R(1) .  EACH  STAGE 
OF  SHIFT  REGISTER  HAS  DELAY  1/n  . 

o 


V. 


Fig.  9.  De  -commutating  circuit  for  serialized  symbols. 

circuit  and  nQ  -  1  stages  of  shift  register.  It  is  of  interest  to  note  that  the  shift  registers 
in  the  encoder  and  decoder  are  shifted  once  per  time  unit,  only  the  shift  registers  in  the 
commutating  and  de -commutating  circuits  are  required  to  shift  at  the  channel -symbol 
rate  of  once  per  l/nQ  time  units. 
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III.  CONVOLUTIONAL  CODES  FOR  THRESHOLD  DECODING 


3.  1  INTRODUCTION 

The  concept  of  threshold  decoding  which  was  formulated  in  Section  I  will  now  be 
applied  to  convolutional  codes  and  this  will  be  continued  in  Sections  IV  and  V.  We  shall 
construct  codes  for  which  the  threshold  decoding  algorithms  can  be  efficiently  applied 
and  present  decoding  circuits  for  these  codes  and  data  on  code  performance  over  several 
communication  channels. 


3.  2  DECODING  PROBLEM  FOR  CONVOLUTIONAL  CODES 


We  shall  present  those  features  of  the  decoding  problem  for  convolutional  codes 
that  are  necessary  for  understanding  the  principles  of  code  construction  for  threshold 
decoding. 

In  section  2.  2,  it  was  shown  that  the  initial  code  words  of  a  convolutional  code  (i.  e., 
all  possible  sets  of  the  n^  symbols  transmitted  from  time  0  through  time  m)  form  a 
linear  or  (n,  k)  code.  Thus  the  remarks  of  section  1.  2  concerning  the  decoding  problem 
for  linear  codes  apply  to  convolutional  codes.  In  addition,  there  is  an  important  special 
feature  of  convolutional  decoding,  namely  the  fact  that  only  the  set  of  first  information 
symbols  (iQ^\  j  =  1,  2,  .  .  .,  kQ)  need  be  determined  by  the  decoding  algorithm  from 
the  nA  symbols  in  the  received  initial  code  word.  One  could,  of  course,  use  fewer,  or 
more,  than  n^  received  symbols  in  decoding  the  set  of  first  information  symbols.  If 
fewer  are  used,  a  shorter  code  would  suffice.  If  more  are  used,  the  error  probability, 
P1(e),  in  decoding  the  set  of  first  information  symbols  can  be  reduced;  but,  as  pointed 
out  by  R.  Gallager  (private  communication),  the  reduced  error  probability  can  be 
obtained  more  simply  by  using  a  code  with  greater  nA  to  begin  with. 

Let  us  assume  that  we  have  an  algorithm  that  permits  us  to  decode  the  set  of  first 

information  symbols  from  the  n .  symbols  in  the  received  initial  code  word.  We  shall 

A  (-jH 

denote  the  decoded  estimates  of  the  set  of  first  information  symbols  as  l  J  to  empha¬ 
size  that  these  quantities  may  differ  from  the  true  values.  Now  consider  the  altered 
received  sequences  R^*(D)  given  by 


R(^*(D)  =  R'J,(D) 


<i\ 


i  (j)5* 


j  =  1*2 . kn 


(64) 


and 


R<j)*(D)  =  r'-F(d) 


oo 


i  (1)*G(^(D)+.  .  ,+i  °  Z(^(D> 


j  =  ko  +  ko  +  2 . V 

(65) 


(■j)*  .  (i) 

From  Eqs.  28  and  29,  it  follows  that  if  the  decoding  were  correct,  that  is,  if  iQ  J  =  l  J 
for  j  =  1,  2,  ....  kQ,  then  the  effect  of  the  set  of  first  information  symbols  has  been 
removed  from  the  transmitted  sequences  and  hence  from  the  received  sequences.  Thus 
the  decoding  of  i^,  for  j  =  1,  2,  . . . ,  kQ,  can  be  performed  by  using  the  same  algorithm 
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on  the  n^  symbols  from  time  1  through  time  m  +  1  of  the  altered  received  sequences, 
R^  (D),  as  was  used  to  decode  i  for  j  =  1,  2,  . . . ,  k  ,  from  the  n.  symbols  from 
time  0  through  time  m  of  the  original  received  sequences,  R  J  (D).  By  continuing  this 
procedure,  the  decoding  can  proceed  sequentially  with  a  new  set  of  kQ  information 
symbols  being  decoded  each  time  unit. 

An  obvious  difficulty  arises  when  a  decoding  error  is  made.  For,  suppose  that  the 
set  of  first  information  symbols  is  incorrectly  decoded,  then  the  operation  of  Eq.  65 
introduces  a  spurious  pattern  of  symbols  into  the  received  parity  sequences,  namely 


,  (1);  (1>« 

L  O  O 


G^(D)  + 


(k  )  <k  )*' 

.  .  .  +  i  0  -i  0 
1-0  o 


Z^(D) 


k  +  1,  k  +2,  .  .  . ,  n  . 
oo'  o 

(66) 


These  spurious  symbols  affect  the  decoding  algorithm  in  much  the  same  manner  as  a 
burst  of  channel  noise  of  length  n^,  and  hence  it  can  be  expected  that  successive  sets 
of  information  symbols  will  have  a  high  probability  of  being  incorrectly  decoded.  This 
is  the  familiar  "propagation  effect"  of  errors  made  in  decoding  convolutional  codes. 

The  two  basic  methods  by  which  this  effect  can  be  circumvented,  namely  "resynchro¬ 
nization"  and  "error-counting,”  will  now  be  explained. 

In  the  "resynchronization"  method,  arbitrary  information  symbols  are  encoded  for 
only  a  fixed  number,  N,  of  time  units.  "Zeros"  are  then  encoded  for  the  next  m  time 
units.  Since  no  digits  more  than  m  time  units  in  the  past  are  used  in  the  decoding 
process,  the  decoding  can  be  carried  out  independently  on  the  symbols  received  in  the 
corresponding  N  +  m  time  units,  after  which  the  decoder  is  cleared  and  the  decoding 
process  is  started  anew  on  the  next  received  symbols.  Thus  any  propagation  of  error 
is  confined  within  N  +  m  time  units  or  (N+m)nQ  received  symbols.  The  resynchronization 
method  reduces  the  information  rate  to  R1  given  by 


R' 


N 

N  +  m 


R, 


(67) 


where  R  =  k  In  is  the  nominal  rate  of  the  convolutional  code.  If  N  »  m,  the  rate 
o'  o 

decrease  is  not  substantial. 

The  "error-counting"  method  is  a  more  refined  manner  of  combating  propagation 

error.  This  method  was  proposed  by  Wozencraft  and  Reiffen  (in  a  slightly  different 

form  than  that  given  here)  to  turn  error  propagation  to  useful  advantage  in  a  two-way 
2  6 

communication  system.  Briefly,  the  method  operates  as  follows:  When  the  decoding 
algorithm  is  "correcting  more  errors"  over  some  span  of  received  bits  than  the  code 
itself  can  reliably  correct,  it  is  assumed  that  a  decoding  error  has  been  made,  or  that 
the  channel  has  temporarily  become  too  noisy  for  reliable  operation.  In  this  event,  the 
receiver  asks  that  the  data  be  re -transmitted  from  some  point  before  that  at  which  the 
high  incidence  of  errors  began.  This  method  will  be  considered  in  more  detail  in 
section  4.  2c. 

The  main  point,  of  the  preceding  discussion  is  that  the  threshold-decoding  algorithm 
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need  be  tailored  only  to  give  the  decoded  estimates  of  e  j  =  1,  2,  ,  kQ,  the 

error  in  the  set  of  first  information  symbols.  The  remainder  of  the  decoding  process, 
namely  the  altering  of  the  received  sequences  to  remove  the  effect  of  the  first  symbols 
and  the  controlling  of  the  error  propagation  effect,  can  be  handled  in  routine  fashion. 
Thus  the  problem  of  convolutional  code  construction  for  threshold  decoding  reduces  to 
finding  codes  for  which  the  mapping  from  the  ordinary  parity  checks  to  the  sets  of 
parity  checks  orthogonal  on  eQ'^,  j  =  1,  2,  . . . ,  kQ,  can  be  carried  out  in  an  efficient 
manner,  since  these  are  the  only  noise  digits  that  must  be  determined  by  the  decoding 
process. 


3.3  PARITY-CHECK  STRUCTURE 

From  the  foregoing  remarks,  it  is  clear  that  a  careful  investigation  of  the  parity- 
check  structure  of  convolutional  codes  will  be  needed.  In  Eq.  58,  it  was  seen  that  the 
parity-check  sequences  are  given  by 


S^(D)  = 


G(i)(D)E(1)(D)+.  .  ,+ZVJ/(D)E  °  (D) 


(k  ) 


E^(D) 


i  =  k0+  1*  k0  +  2-  ••••  V 


(68) 


In  our  usual  manner,  we  represent  these  sequences  by  the  notation 
S^(D)  =  sQ(:i)  +  s^D  +  s2^D2  +  .  .  .  , 


(69) 


where  s  ^  is  then  the  parity  check  in  sequence  S^(D)  that  is  formed  at  time  u  by  the 
circuit  of  Fig.  8.  From  Eq.  68,  we  have 


b  (j)  =  lg  (j)e  (1)+g  <j)e1(1)+...+g  (1) 

U  |“U  o  &u-l  1  &o  u 


+  . . . 


(]>.  <koL  W..V 


t  \  (k  ) 

,z  'J,e  ~+z  'J'e,  ” +...+Z  KJ,e  0 

L  u  o  u— 1  1  o  u 


(j) 


(70) 


Since  gy^  =  0  for  u  >  m,  no  parity  check  will  check  noise  digits  more  than  m  time 
units  in  the  past. 

In  matrix  form,  Eq.  70  may  be  written  for  the  parity  checks  from  time  0  through 
time  m  as 
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where  I  (n  -k  )  is  the  unit  matrix  of  dimension  m(no~ko),  and  HA  is  the  matrix 
moo 
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The  nonzero  entries  of  HA  are  seen  to  lie  entirely  within  the  set  of  (no-ko)ko  triangular 
submatrices  to  which  we  shall  refer  as  the  parity  triangles.  There  are  nQ  -  kQ  rows  of 
parity  triangles  in  HA>  and  kQ  columns  of  parity  triangles. 


t 
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The  structure  of  the  parity  triangles  will  be  of  considerable  utility  in  con¬ 
structing  codes  for  threshold  decoding.  Each  parity  triangle  has  the  following 
general  form 


gl  go 

g2  gl  8o 

8  3  s2  81  go 


8m  8m-l 


81  8o 


In  other  words,  each  parity  triangle  has  a  structure  that  is  determined  by  one  of  the 
code-generating  polynomials.  The  entire  parity  triangle  is  determined  uniquely  by  the 
first  column,  or  the  last  row. 

An  example  at  this  point  will  be  helpful  in  clarifying  the  parity-check  structure. 

Consider  a  binary  convolutional  code  with  rate  R  =  kQ/no  =2/3  and  n^  =  (m+l)no  =  9. 

(3)  (3)  2 

We  choose  the  code-generating  polynomials  to  be  G  (D)  =  1  +  D  and  H  (D)  =  1  +  D  . 

Then  Eq.  71  becomes  (-1  =  +1  in  the  binary  number  field,  since  1  +  1  =  0  implies 

1  =  -1) 
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(73) 


which  is  the  matrix  representation  of  the  following  equations: 
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s(3)  =  e  <» 
o  o 


+  e 


(2) 


+  e 


(3) 


s  0)_e  (l)  +  e  (1) 

S1  o  1 


+  e 
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+  e 


(3) 


(3)  =  e  (1)  +  e  (1)  +  e  <2> 

l  el  e2  o 


+  e, 


(2) 


+  e, 


(3) 


(74) 


From  (74)  and  (73),  it  is  easily  seen,  by  comparison,  that  the  manner  in  which  the  noise 

bits  enter  into  the  parity  checks  can  be  read  directly  from  the  matrix  .  Each 

row  of  [H.:-I]  gives,  the  coefficients  of  the  noise  bits  in  one  parity  check. 

“  (3)  (3) 

It  should  be  noticed  that  sQ  and  s^  form  a  set  of  two  parity  checks  orthogonal 

on  e  since  both  check  e  but  no  other  noise  bit  is  checked  by  both.  Similarly, 

(3)  (3)  (2) 

sq  and  Sg  can  be  seen  to  form  a  set  of  two  parity  checks  orthogonal  on  eQ  .  It 

now  follows  from  Theorem  1  that  the  noise  bits  in  the  first  set  of  information  symbols, 

namely  e  ^  and  eQ^,  can  be  correctly  determined  by  the  majority  decoding  algorithm 

provided  that  there  are  one  or  fewer  errors  in  the  first  n^  =  9  received  symbols.  Thus, 

as  we  have  explained,  this  same  majority  decoding  algorithm  can  be  used  to  correct  all 

errors  in  the  received  sequences,  provided  that  the  9  symbols  received  in  any  3 

consecutive  time  units  never  contain  more  than  a  single  error. 

We  shall  now  show  how  the  concepts  in  this  example  can  be  generalized  to  yield  an 

interesting  and  useful  set  of  codes.  In  the  rest  of  this  section,  we  shall  consider  binary 

codes  only. 


3.4  BOUNDS  ON  CODES 

It  is  convenient  to  introduce  the  following  definitions  at  this  point.  Let  {A.}  be  a 

(1 )  1 

set  of  J  parity  checks  orthogonal  on  eQ  for  a  binary  convolutional  code  with  rate 
R  =  1/n  . 

0  (1) 

The  number  of  noise  bits,  exclusive  of  e  ,  which  are  checked  by  some  A.,  will 

o'  J  i 

be  called  the  size  of  A.  and  will  be  denoted  n.. 

- i  i 

The  total  number  of  distinct  noise  bits  checked  by  the  {A.}  will  be  called  the 
effective  constraint  length  of  the  code  and  will  be  denoted  n^. 

From  this  definition,  it  follows  immediately  that 


J 

nE  =  i  +  £  n. 
i=l 

and,  of  course, 


(75) 


■*E 


A ' 


(76) 


since  n.  is  the  total  number  of  distinct  bits  checked  by  the  parity  checks,  s  which 
A  U 

are  combined  to  form  the  {A.^}-. 
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THEOREM  10:  For  any  rate  R  =  1/n  and  any  integer  J,  there  exists  a  convolutional 
r  1  ^  ) 

code  for  which  a  set  {A^}  of  J  parity  checks  orthogonal  on  eQ  can  be  formed  in  such  a 
manner  that  the  effective  constraint  length  satisfies 


= 


R 


E  =  2  1  -  R 


.2  x  1  T  ^  x  1  Ti  R  i 


(77) 


where  r  =  J  mod(nQ-l). 

The  proof  of  Theorem  10  is  given  in  Appendix  C.  This  theorem  is  proved  by  showing 

that  it  is  possible  to  construct  a  convolutional  code  so  that  there  are  J  parity  checks, 

s  which  check  e  ^  and  are  such  that  these  parity  checks  themselves  form  a  set  of 
u  0  (1) 

parity  checks  orthogonal  on  eQ  .  In  this  construction,  the  sizes,  n.,  of  these  parity 

checks  are  as  stated  in  the  following  corollary: 

COROLLARY:  For  any  rate  R  =  l/n  and  any  integer  J,  there  exists  a  binary 

0  (1) 

convolutional  code  for  which  J  parity  checks  orthogonal  on  eQ  can  be  formed  so  that 
n  - 1  of  these  parity  checks  have  size  j  for  each  j  from  1  through  Q,  and  are  such 
that  r  have  size  Q+l,  with 

J  =  Q(nQ-l)  +  r  ,  0  S  r  <  nQ  -  1  .  (78) 


For  rates  R  =  ko/nQ,  where  kQ=£  1,  Theorem  10  can  be  generalized  in  the  following 
manner:  The  effective  constraint  length,  nE,  is  defined  as  the  maximum  over  j  of  the 
number  of  distinct  noise  bits  that  appear  in  the  set  of  J  parity  checks  orthogonal  on 
e  ^  for  j  =  l,  2,  .  .  . ,  kQ.  With  this  definition  we  have  the  following  theorem,  the 
proof  of  which  is  also  given  in  Appendix  C. 

THEOREM  11:  For  any  rate  R  =  kQ/no  and  any  integer  J,  there  exists  a  convo¬ 
lutional  code  for  which  J  parity  checks  orthogonal  on  e  ^  for  j  =  1,  2,  .  .  . ,  kQ  can  be 
formed  so  that  the  effective  constraint  length,  n^,,  satisfies 


R 


‘E  =  2  1  -  R 


J2  +  rk  J  + 
2  o 


i+Hvr^W’ 


(79) 


where  r  =  J  mod(n  -k  ). 

o  o 

For  kQ  =  1,  Theorem  11  reduces  to  Theorem  10  as  expected. 

An  estimate  of  the  quality  of  the  codes  which  satisfy  (79)  or  (77)  can  be  obtained  as 
follows.  The  Gilbert  bound,  Eq.  47,  states  that  the  ratio,  d/n^,  of  minimum  distance 
to  constraint  length  can  be  made  to  satisfy  H(d/n^)  =  1  -  R.  Minimum  distance  d 
implies  that  the  first  set  of  information  symbols  can  always  be  correctly  decoded  when¬ 
ever  l(d-l)/2|or  fewer  errors  occur  among  the  n.  symbols  within  the  constraint  length. 

(1)  ^ 

On  the  other  hand,  we  have  seen  that  eQ  can  be  correctly  decoded  by  majority  decoding 
whenever  (_.T /2 J  or  fewer  errors  occur  among  the  n^,  symbols  within  the  effective 
constraint  length.  Thus  the  ratio  J/2n-g  ought  to  be  at  least  as  great  as  the  ratio 
(d-l)/2n^  as  guaranteed  by  the  Gilbert  bound  if  threshold  decoding  is  to  be  efficient. 

In  Fig.  10,  we  compare  the  ratio  J/2nE  guaranteed  by  Theorems  10  and  11  to  the 
ratio  (d-l)/2n^  guaranteed  by  the  asymptotic  Gilbert  bound  of  Eq.  47.  (Comparison 
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Fig.  10.  Ratio  of  number  of  orthogonal  parity  checks,  J,  to  twice  the 
effective  constraint  length,  n-g,  as  given  by  Theorems  10  and 
12  and  by  comparison  with  the  asymptotic  Gilbert  bound. 

with  the  nonasymptotic  bound  of  (43)  is  the  proper  choice  but  is  difficult  to  present 
graphically.  However,  the  comparison  is  hardly  affected  since  for  n^  «  100,  d/ 2n^ 
is  nearly  the  same  for  the  asymptotic  and  nonasymptotic  bounds.  For  example,  at 
n^  =  100  and  R  =  1/2,  the  former  gives  .055  and  the  latter  gives  .060.)  For  a  wide 
range  of  rates,  the  ratio  J/2ng  does  not  fall  below  the  Gilbert  bound  until  the  effective 
constraint  length  becomes  on  the  order  of  100  to  150  bits.  It  is  for  codes  up  to  these 
effective  constraint  lengths  that  threshold  decoding  can  be  expected  to  be  efficient. 
Clearly,  for  codes  that  satisfy  Theorem  10  with  the  equality  sign,  the  J/ng  ratio  must 
eventually  become  poor,  since  J  increases  only  as  the  square  root  of  n^,,  whereas  the 
Gilbert  bound  shows  that  it  is  possible  to  make  d  grow  linearly  with  n^. 

We  have  not  been  successful  in  finding  codes  for  which  the  ratio  J/2ng  is  sub¬ 
stantially  smaller  than  the  bounds  implied  by  Theorems  10  and  11  for  rates  1/10  and 
greater.  On  the  other  hand,  we  have  been  unable  to  prove  that  it  is  impossible  to  find 
such  codes  except  for  the  particular  case  R  =  1/2.  In  this  case  we  have  been  able  to 
show  that  the  bound  of  Theorem  10  is  also  a  lower  bound  on  n^, 

THEOREM  12:  For  rate  R  =  1/2  and  any  integer  J,  it  is  possible  to  find  a  convo¬ 
lutional  code  which  is  such  that  J  parity  checks  orthogonal  on  e  ^  can  be  constructed 
and  for  which 
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(80) 


f 


as 


nE=lj2+iJ  +  1 


Conversely,  it  is  impossible  to  find  such  a  code  for  which  n^,  is  smaller  than  this 
number. 

PROOF  12:  The  proof  that  it  is  possible  to  make  n^  at  least  as  small  as  the  number 
in  Eq.  80  follows  by  substituting  R  =  1/2  in  (77).  The  proof  that  it  is  impossible  to  do 
better  than  this  bound  proceeds  as  follows. 

The  first  step  in  the  proof  is  to  place  a  lower  bound  on  the  number  of  "ones"  that 
must  be  contained  in  certain  sums  of  columns  of  H^.  In  the  same  manner  that  Eq.  71 
was  derived  from  Eq.  68,  it  can  be  readily  shown  that  for  an  R  =  1/2  code,  the  parity 
sequence  T  ^(D)  is  given  in  terms  of  the  information  sequence  I  ^(D)  by  the  matrix 
product 
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The  parity  sequence  as  a  vector  is  then  the  sum  of  the  columns  of  corresponding  to 
the  information  symbols  that  are  "ones."  Suppose  there  is  some  sum  of  C  columns  of 

H.,  including  the  first  column  which  contains  N  "ones."  Equation  81  then  implies  that 

A  (1) 
there  is  an  initial  code  word,  for  which  i  =  1,  which  has  C  nonzero  information 

symbols  and  N  nonzero  parity  symbols.  The  weight  of  this  initial  code  word  is  then 

C  +  N  and  must  be  at  least  d,  the  minimum  distance  of  the  code.  Hence  we  must  have 

N  S  d  -  C,  that  is,  any  sum  of  C  columns  of  H^,  including  the  first  column,  must  have 

at  least  d  -  C  ones. 

For  R  =  1/2,  the  matrix  consists  of  a  single  parity  triangle,  namely 
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The  second  step  in  the  proof  is  to  notice  the  row-column  equivalence  of  H^.  Specifi¬ 
cally,  we  observe  that  if  only  the  first  four  rows  of  are  retained,  then  the  sum  of  the 
second  and  fourth  rows  contains  exactly  the  same  number  of  "ones"  as  the  sum  of  the 
first  and  third  columns  since  the  same  terms  are  added  together.  In  general,  there  is 
a  one-to-one  correspondence  between  sums  of  rows  that  include  the  last  row  and  sums 
of  columns  that  include  the  first  column.  Thus  we  can  conclude,  from  the  previous 
paragraph,  that  any  sum  of  C  rows  of  H^,  including  the  last  row,  must  have  at  least 
d  -  C  ones. 

Assume  now  that  we  have  an  R  =  1/2  code  for  which  a  set  A,,  A„,  .  .  . ,  AT  of  parity 

(1)  1  i  J 
checks  orthogonal  on  eQ  has  been  formed.  The  coefficients  in  each  A^  correspond, 

as  we  have  seen,  to  the  sum  of  selected  rows  of  the  matrix  [H  »:l].  We  assume  that  the 

parity  checks  have  been  ordered  so  that  if  i  <  j,  then  the  lowermost  row  added  in  forming 

A.  is  beneath  the  lowermost  row  added  to  form  A..  Now  consider  the  R  =  1/2  convo- 

lutional  code  for  which  the  last  row  in  its  parity  triangle  is  the  lowermost  row  added  to 

form  A..  The  minimum  distance,  d.,  of  this  code  must  be  at  least  i  +  1  according  to 

the  corollary  of  Theorem  1.  1  since  it  is  possible  to  construct  i  parity  checks  orthogonal 

on  eQ^,  namely  Aj,  Ag,  ...,  A..  Finally,  suppose  that  C.  rows  of  [H^:I]  were  added 

to  form  A^  Then  the  size,  n.,  of  this  parity  check  must  satisfy 

m  g  [(dj-C^-1]  +  Ci  (83) 

as  we  can  see  in  the  following  way.  The  number  of  information  noise  bits  checked  by  A. 

/  1  \  1 

exclusive  of  e  '  ,  is  one  less  than  the  number  of  "ones"  in  the  sum  of  the  C.  rows  of 
o  1 

H^.  Since  this  sum  includes  the  last  row,  it  must  have  at  least  d^  -  Ch  "ones."  The 
last  term  on  the  right  of  (83)  is  accounted  for  by  the  fact  that  the  sum  of  Ch  rows  of  the 
identity  matrix,  I,  always  contains  Ch  "ones,"  and  this  is  the  number  of  parity  noise 
bits  checked  by  A^. 

Since  d^  is  at  least  i  +  1,  (83)  becomes  simply 

n.  g  i  .  (84) 

Substituting  (84)  in  (75),  we  obtain 

nES  2  i+  1  =ij2  +  23  +  1  (85) 


and  this  is  the  statement  of  the  theorem. 

The  fact  that,  for  R=  1/2,  Theorem  10  gives  the  smallest  possible  value  of  n^ 
tempts  one  to  conjecture  that  the  same  result  might  hold  for  other  rates.  This  con¬ 
jecture  is  not  valid.  In  the  following  sections,  many  codes  will  be  constructed  for  which 
n-p  is  somewhat  smaller  than  the  upper  bound  of  Theorem  10  when  R#  1/2. 
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3.5  SELF-ORTHOGONAL  CODES 


We  now  consider  in  more  detail  those  convolutional  codes  for  which  the  set  of 
parity  checks  orthogonal  on  e  ^  is  of  the  simplest  kind.  We  define  a  self-orthogonal 

convolutional  code  to  be  a  code  in  which  the  set  of  all  parity  checks,  s  that  check 

eQ^  for  i  =  1,  2,.  ....  kQ,  is  itself  a  set  of  parity  checks  orthogonal  on  e 

With  this  definition,  the  codes  used  in  the  constructive  proof  of  Theorems  10  and  11 
are  self-orthogonal  codes.  Let  us  restrict  our  attention  to  these  codes  and  to  rate 

R  =  1/2.  In  this  case  there  is  only  a  single  parity  triangle,  Eq.  82,  corresponding  to 

(2) 

the  single  code-generating  polynomial  G  (D).  The  construction  used  to  prove 

( 2 ) 

Theorem  10  gives  the  code  for  which  the  nonzero  terms  in  G  (D)  are 

g  .  ,  <2>  =  1  i  =  1,2,  ...,J.  (86) 

21  -1 

The  degree  of  G^(D)  then  is  m  =  21*  *  -  1.  The  set  of  J  parity  checks  that  check 
on  e  namely 

s  ,  i  (2)  i  =  1, 2 . ,J,  (87) 

21  -1 

is  a  set  of  J  parity 'checks  orthogonal  on  e  having  an  effective  constraint  length 

nE=lj2  +  2'J  +  1‘  (88) 


On  the  other  hand,  the  actual  constraint  length,  n^,  is 


n.  =  n  (m+1)  =  2d 
A  o 


(89) 


from  which  fact  we  see  that,  for  even  moderate  values  of  J,  n^  is  much  greater  than  n^,. 

A  large  ratio  of  n^  to  n^  is  generally  undesirable  for  two  reasons.  First,  both 
encoder  complexity  (cf.  sec.  2.7)  and  decoder  complexity  (cf.  sec.  4.2c)  are  directly 
proportional  to  n^  rather  than  to  n^,.  Second,  the  resynchronization  period  may  be 
unacceptably  long  (cf.  sec.  3.2)  and  thus  may  not  provide  an  effective  safeguard  against 
the  propagation  of  a  decoding  error.  This  latter  fact  can  be  seen  best  from  an  example. 
It  follows  from  Eq.  67  that  N  =  9m  is  a  reasonable  choice  if  the  actual  information  rate 
is  not  to  be  substantially  less  than  the  nominal  rate  R  =  1/2.  The  resynchronization 
period,  N  +  m,  is  then  10m  time  units.  For  J  =  10,  we  have  m  =  2^  *-1  =  511  and 
thus  the  resynchronization  period  is  5110  time  units.  On  the  other  hand,  m^,  =  56  for 
this  code,  and  hence  if  n^  and  n^,  were  equal  (which  would  imply  m  =  27)  the  resynchro¬ 
nization  period  would  be  only  270  time  units. 

It  is  an  interesting  fact  that  self-orthogonal  codes  can  be  constructed  with  n^  much 

smaller  than  for  the  codes  used  to  prove  Theorem  10.  We  shall  illustrate  the  technique 

(2) 

for  R  =  1/2.  The  method  is,  for  increasing  j,  to  choose  g.  =1  when,  and  only  when, 
/?\  J  (i) 

the  parity  check  s.  contains  no  noise  variable  except  e  in  common  with  the 


preceding  parity  checks  on  e  ^  .  Thus,  for  example,  the  J  =  5  code  formed  in  this 
manner  has  the  parity  triangle 


2-10 

Oil 

3  -1000 

0  10  11 
0  0  10  11 
0  0  0  1  0  1  1 

4  —i  o  o  o  a  o  an 

010001011 

0010001011 

00010001011 

000010001011 

5  -«-l  0  0  0  0  1  0  0  0  0  0  00 


Here,  we  have  adopted  the  convention  of  using  a  numbered  arrow  to  indicate  an  orthogo¬ 
nal  parity  check  and  its  size,  n.,  and  of  placing  a  box  about  each  nonzero  coefficient 

(1)  1 

(except  the  coefficient  of  eQ  )  of  an  information  noise  bit  in  an  orthogonal  parity  check. 
This  convention  renders  the  orthogonality  of  the  set  of  parity  checks  readily  visible. 

In  Table  I,  we  list  the  set  of  codes  for  even  values  of  J  up  to  J  =  14  that  are  obtained 
by  the  preceding  construction.  The  constraint  length  2J  of  Eq.  89  is  listed  for  compari¬ 
son.  From  Table  I,  we  see  that  a  great  improvement  in  the  n n-g  ratio  is  obtained  for 
this  second  class  of  self-orthogonal  codes,  but  the  ratio  is  still  large  enough  to  require 
many  additional  storage  devices  in  the  encoder  and  decoder  beyond  those  that  would  be 
required  when  the  n^/ng  ratio  is  almost  unity. 

Table  I.  Self-orthogonal  codes  for  rate  -g-. 


R 

J 

nF, 

2J 

code 

1 

2 

2 

4 

4 

4 

(2) 

8o  “  81 

(2) 

-  l 

4 

11 

16 

16 

add 

(2) 

83 

m 

*/2> 

6 

22 

42 

64 

add 

8  (2) 
“12 

m 

%0<2) 

8 

37 

90 

256 

add 

8  (2> 
“30 

m 

8  <2> 
*44 

10 

56 

162 

1024 

add 

8  (2) 
865 

m 

8  (2> 
g80 

12 

79 

238 

4096 

add 

8  (2) 
896 

m 

8  (2) 
*118 

14 

106 

356 

16384 

add 

8  (2) 
g143 

m 

8  <2> 
*177 

1 

1 

1 

1 

1 

1 


("add"  means  that  the  code*generating  polynomial  Is  the  same 
as  for  the  previous  code  with  the  additions  indicated.) 
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3.6  TRIAL- AND-ERROR  CODES 


The  fact  that  the  n^/nE  ratio  is  much  greater  than  unity  for  most  self-orthogonal 
codes,  fortunately,  does  not  imply  that  this  must  always  be  the  case  for  other  convo¬ 
lutional  codes.  By  trial-and- error  techniques,  it  has  been  found  possible  to  construct 
codes  for  which  n^  meets  or  is  smaller  than  the  upper  bound  of  Eq.  77  and  for  which 
n^/ nE  *s  a-*-mos'l;  unity-  An  extensive  list  of  such  codes  is  given  in  Table  II  for  rates 
1/2,  1/3,  1/5,  and  1/ 10. 


An  example  will  serve  both  to  indicate  how  Table  II  is  to  be  read  and  to  give  some 
idea  of  how  the  codes  were  constructed.  Consider  the  R  =  1/2,  J  =  6  code  in  Table  II. 

o 

The  code  is  listed  as  (0,  6,  7,  9,  10,  11)  which  is  our  short  notation  for  indicating 

(2) 

that  the  code-generating  polynomial  G'  (D)  has  as  its  nonzero  coefficients 


(2) 


*6,2I-S, 


(2) 


§9 


(2) 


(2) 


no 


ni 


<2>=i. 


(91) 


The  parity  triangle  (Eq.  82)  then  becomes 


1  — -  1 

0  i  - - 

0  0  1 

0  0  0  1  - - 

0  0  0  0  1  - 

0  0  0  0  0  1 

2  —  1  0  0  0  0  o  CD 

3  —1  lil  oooooffl 

0  1  1  0  0  0  0  0  0  - 

4  —1  offloooool 

5  -*-i  loitDoooooffl--1 

6  — *- 1  i  loiQJoooooEO 


(92) 


Here,  in  addition  to  the  convention  used  in  Eq.  90,  we  have  indicated  with  the  lines  to 
the  right  of  the  triangle  which  rows  are  added  together  to  form  the  orthogonal  parity 
checks.  The  rules  for  forming  the  orthogonal  parity  checks  are  listed  in  the  table  as 
02,  62,  72,  92,  1232102,  428 2 1 1 2  which  is  short  notation  for 


A 

A 

A 

A 

A 


1 

2 

3 

4 

5 


A6 


S6 

s7 

S9 

S1 


(2) 

(2) 

(2) 

(2) 

<2)  +  s3<2)  + 

s  (2) 
s10 

(2)  +  s8<2)  + 

s  <2) 
S11 

(93) 
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and  this  set  of  6  parity  checks  orthogonal  on  e  ^  corresponds  to  the  parity  checks 
constructed  in  (92).  The  sizes  (n.)  of  these  parity  checks  are  given  in  the  table  as 
1,  2,  3,  4,  5,  6  which  indicates  that  A^  has  n^  =  1,  has  =  2,  etc.  The  effective 
constraint  length  is  given  in  the  table  as  n^,  =  22.  For  comparison,  the  bound  of  Eq.  77 
is  listed,  and  is  also  n^,  =  22.  The  actual  constraint  length,  n^  =  24,  is  also  listed. 

The  manner  in  which  this  particular  code  was  found  is  as  follows:  By  beginning  the 
second  through  the  sixth  rows  of  the  parity  triangle  with  "zeros,"  noise  bits  eg^ 

through  e^1^  can  have  a  nonzero  coefficient  in  only  one  row  of  the  parity  triangle  of 
(92).  Moreover,  the  second  through  the  sixth  rows  can  then  be  used  to  eliminate  one 
nonzero  coefficient  for  each  of  the  variables  through  eg^,  as  is  evident  from  the 

parity  triangle.  The  first  row  gives  a  parity  check  on  e  ^  which  checks  no  other  infor¬ 
mation  noise  bit.  Thus  the  problem  reduces  to  choosing  the  last  six  rows  of  the  parity 

triangle  so  that  they  can  be  combined  to  give  a  set  of  five  parity  checks  that  check  e  ^ 

(1)  (1)  ° 

but  have  the  property  that  none  of  the  variables  through  eg  are  checked  more 

than  twice  in  the  set.  The  choice  made  in  Eq.  92  fulfills  these  requirements. 

The  rest  of  the  codes  in  Table  II  were  hand-constructed  by  using  this  and  numerous 
other  techniques  to  exploit  the  parity-triangle  structure.  The  simplicity  of  the  concept 
of  orthogonal  parity  checks  makes  it  possible  to  construct  codes  of  substantial  length  by 
hand. 

A  few  more  remarks  on  the  trial-and-error  codes  are  in  order.  Consider  the  code 

word  in  a  rate  R  =  l/n  code  for  which  i  ^  is  the  only  nonzero  information  bit.  From 
'  o  o 

Eqs.  28  and  29,  it  follows  that  the  weight  of  this  code  word  is  one  plus  the  number  of 
nonzero  terms  in  the  code-generating  polynomials  G^(D),  j  =  2,  3,  . .  . ,  nQ.  Thus 
there  must  be  at  least  d-1  such  nonzero  terms,  where  d  is  the  minimum  distance  of 
the  code.  On  the  other  hand,  the  existence  of  J  parity  checks  orthogonal  on  e  ^ 
implies  that  d  is  at  least  J  +  1,  and  hence  that  there  must  be  at  least  J  nonzero  terms 
in  the  code-generating  polynomials. 

The  trial-and-error  codes  in  Table  IX  (with  the  few  exceptions  marked  by  an 
asterisk)  have  the  property  that  the  number  of  nonzero  terms  in  the  code -generating 
polynomials  is  exactly  J,  which  is  the  minimum  number  possible.  This  has  the 
desirable  result,  as  can  be  seen  from  Figs.  6  and  5,  of  reducing  the  number  of  inputs 
to  the  adders  in  the  encoding  circuits  to  the  minimum  number  possible.  Moreover,  the 
minimum  distance  of  these  codes  is  exactly  J  +  1,  since  there  is  a  code  word  with 
i  ^  =  1  which  has  this  weight. 

We  shall  say  that  a  convolutional  code  with  minimum  distance  d  and  rate  R  =  l/n 

(1)  ° 

can  be  completely  ortho gonalized  if  d-1  parity  checks  orthogonal  on  eQ  can  be  formed 

In  this  case,  e  ^  will  be  correctly  decoded  by  majority  decoding  for  any  error  pattern 
that  has  weight  [(d-l)/2J  or  less.  In  other  words,  any  error  pattern  that  is  guaranteed 
to  be  correctable  by  the  minimum  distance  of  the  code  is  also  correctable  by  majority 
decoding  when  the  code  can  be  completely  orthogonalized.  With  this  definition,  the 
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Table  II.  Trial -and -Error  Codes. 


RATE  J 


1/2  2 
4 
6 

8 


10 


12 


1/3  2 
4 
6 

8 


n£  nE 

(Eq. 3-14) 

nA 

CODE 

RULES  FCR 
0RTH0G0NALIZATX0N 

SIZES 

4 

4 

4 

(0,1)2 

02,12 

1,2 

11 

11 

12 

(0,3,4, 5)2 

02,32,42,1252 

1,2, 3, 4 

22 

22 

24 

(0,6, 7, 9, 

02,62,72,92, 

1,2, 3, 4, 

10, 11)  2 

1232102,4282112 

5,6 

37 

37 

44 

(0,11,13,16, 

02,112,132,162, 

1,2, 3,4, 

17, 19, 20, 21)2 

172,223262192, 

5,6, 

42142202, 

7, 

125282152212 

8 

56 

56 

72 

(0,18,19,27, 

02,182,192,272, 

1,2, 3, 4, 

28,29,30,32, 

li!9/282,102202292. 

5,6, 

33, 35)2 

2  2  2 
ir30  31  , 

7, 

2  2  2  2 

13  21  23*32*, 

8, 

2  2  2 

14  33  34  , 

9, 

2232162242262352 

10 

79 

79 

104 

(0,26,27,39, 

02,262,272,392, 

1,2, 3, 4, 

40,41,42,44, 

12132402,142282412 

5,6, 

45,47, 48, 51)2 

152422432, 

7, 

172292312442, 

8, 

182452462, 

9, 

2232202322342472, 

10, 

212352482492502, 

11, 

242302332362382512 

12 

3 

3 

3 

2  3 

<or<o  r 

oV 

1,1 

7 

7 

9 

(O,l)2(0,2)3 

2  3  2  3 
0l,0*X,  2 

1,1, 2, 2 

13 

13 

15 

(o,D2 

02,03,12,23, 

1,1, 2, 2, 

(0,2,3,4)2 

1333,2243 

3,3 

20 

21 

24 

(0,1, 7)2 

02,03,12,23, 

1,1, 2, 2, 

(0,2,3,4,6)3 

lV,2243,72, 

3,3,3, 

32526263 

4 
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Table  II.  (continued). 


RATE 


1/3 


J  nE 

nE 

(Eq.3-14) 

nA 

CODE 

RULES  FOR 
ORTHCGONALIZATION 

SIZES 

10  30 

31 

33 

(0,1, 9)2 

02,03,12,2223, 

1,1,2 ,2 

(0,1, 2,3, 

92,3V,325253, 

3,3,3, 

5,8,9)3 

13426263, 

4, 

2  3  3  3  3 

8*8,7  9  10 

5,5 

12*  40 

43 

54 

(0,4, 5,6, 7, 

„2  -3  .2,3  .2 

0  ,0  ,1  1  ,4  , 

1,1, 2, 2 

9,12,13,16)2 

,2  ,3,2  ..3 

5  ,2  6  ,14  , 

3,3,3, 

(0,1, 14, 15, 16) 3 

2  2  2  3 

7zicririr, 

4, 

3  3  2  3  3  2 

3J5  9  ,6*Vl2  , 

4,5, 

3  3  3 

3  16 J17  , 

5, 

3  3  3  2 

4J10  12J164 

6 

•k 

14  55 

57 

69 

(0,4, 5, 6, 7, 9, 

02,03,1213,42, 

1,1, 2, 2, 

12,13,16,19, 

52,2362, 

3,3, 

20, 21)2 

72102112113, 

4, 

(0,1 ,20,22)3 

335392,193203, 

4,4, 

3  3  3  2 

22^,6  Vl24, 

4,5, 

43103123162, 

6, 

3273133153192, 

7, 

93132143182 

8 

202212213 

16  68 

73 

108 

(0,4, 5, 6, 7, 

02,03,1213,42, 

1,1, 2, 2 

9,12,16,17, 

52,2362,223, 

3,3,3, 

30, 31)2 

2  2  2  3 

7  ionrii  , 

4, 

(0,1,22, 

2  3  3  3  2 

3Z25  , 3J5yZ , 

4,4, 

25, 35)3 

3  3  2 

6Vl2  , 

5, 

3  2  2  2  3 

7  14*17  18  18, 

5, 

3  2  2  2  3 

9  16*19  20  20  , 

6, 

3  3  3 

14J15J35, 

7, 

3  3  2  2  2 

12  21J28*31  32  , 

8, 

3  3  3  3  3  2 
10-  13J19J26  29  30* 

9 

Table  II.  (continued). 


RATE  J 

nE 

n 

CODE 

RULES  FOR 

SIZES 

(Eq.  3-14) 

A 

0RTHOG0NALI2ATI0N 

1/5  4 

5 

5 

5 

(0)2(0)3(0)4(0)5 

*0 

o 

O 

CO 

O 

(M 

O 

1,1, 1,1 

6 

9 

9 

10 

add  (1)Z(1)3 

l2!4,!3!5 

2,2 

Q 

Q 

13 

13 

15 

add  (2)2(2>4 

2223,2425 

2,2 

10 

18 

19 

20 

add  (3) 2 (3) 3 

35,3233 

2,3 

13 

27 

29 

30 

add  (4)2(5)4(5)5 

344244,55,435354 

3,3,3 

14 

30 

33 

35 

add  (6) 4 

4564 

3 

16 

37 

41 

45 

add  (8) 3(7)4 

83,52637274 

3,4 

18 

47 

51 

55 

(0,1, 2, 3, 5, 6, 

o2,o3,o4,o5, 

1,1, 1,1 

8,10)2(0,3,5, 

12,2224,33,1415, 

2, 2, 2, 2 

,  3  4 

3  5  2  2  4  2  4 

2,3,3, 

6,8)  (0,1; 

2  2  ,3T,3  5  5  , 

tO 
y— s 

O 

r— 4 

© 

94102103,53,3563, 

3,3,3, 

105,13446264, 

3,4, 

72748292, 

5, 

4355738384 

5 

20 

55 

61 

65 

add  (10)4(12)3 

6595123, 

4, 

4  5  4  5 

10  UJ12*12J 

4 

* 

22 

65 

73 

80 

add  (11,13,14)2 

45134142143, 

4, 

(15)  5 

135144154155 

6 

1/10  9 

10 

10 

10 

2  3  4 

(oy(oy(oy 

o2,o3,o4, 

1,1,1 

(0)5(0)6(0>7 

o5,o6,o7, 

1,1,1 

(0)8(0)9(0>10 

o8,o9,o10 

1,1,1 

14 

20 

20 

20 

add  (1)6(1>7 

l2l6,lV, 

2,2, 

(1)8(1)9(1)10 

l4l8,lV,l10 

2,2,2 

18 

28 

28 

30 

add  (2)2(2)3 

22,:324, 

2,2, 

(2)6(2)9 

2627,2829 

2,2 
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Table  II.  (continued). 


RATE 

J 

"E 

nF 

(Eq.  3-14) 

CODE 

RULES  FOR 

SIZES 

A 

ORTHOGONALIZATION 

1/10 

23 

39 

43 

40 

add  (3)2(3)4 

3233,34  3738, 

2,2,2 

<3)8(3)9(3>10 

3639,2535310 

2,3 

26 

48 

52 

50 

2  8  9 

add  (4)4(4) W 

.8.10  A0, 2 

4  4  ,2  4, 

2,3, 

,3.4.  7,9 

4  4  4  4 

4 

33 

69 

79 

70 

add  (6)2(5)5 

-5  /M-6 

5  ,4  5  5  , 

2,3, 

(5,6)6(5,6)9 

, 5_2,9  -7,5,6 

4  5  5  ,5  6  6  , 

3,3, 

<6)10 

586469  53610 

3,3, 

6263G768 

4 

38 

88 

101 

90 

add  ( 7) 6 ( 7  ,8) 9 

51076710,87, 

3,3, 

(8)7(8)8 

72757779, 

73748589, 

4, 

4, 

7883848588 

5 

Codes  marked  with  an  asterisk  have  J  +  2  nonzero  terms  in  the  code¬ 
generating  polynomials;  all  other  codes  have  the  minimum  possible  num¬ 
ber  of  nonzero  terms,  namely  J. 

The  symbols  used  in  Table  II  have  the  following  meanings; 

J  =  number  of  parity  checks  orthogonal  on  e 
.  ng,  =  effective  constraint  length  of  the  code, 
n^  =  actual  constraint  length  of  the  code. 
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codes  of  Table  II  (with  the  possible  exceptions  of  those  codes  marked  with  an  asterisk) 
can  be  completely  orthogonalized. 

3.7  UNIFORM  CONVOLUTIONAL  CODES 

In  contrast  to  the  codes  of  the  previous  section  (which  were  constructed  by  cut-and- 
try  procedures)  we  shall  next  study  some  classes  of  convolutional  codes  for  which  a 
systematic  formulation  can  be  given  and  which  can  be  completely  orthogonalized.  In 
this  section,  we  show  the  existence  of  a  class  of  convolutional  codes  that  satisfy  the 
distance  bound  of  Theorem  7  with  the  equality  sign,  and  which  can  be  completely 
orthogonalized. 

For  any  integers  L  and  M,  we  consider  the  set  of  L2^-l  binary  (M+l)-tuples 
formed  as  shown  below  for  L  =  2  and  M  =  2 . 


OOl—i 
Oil  — 
2  — •_  10  1  — 

2  — *-  1.  1  1  - 


0  11—1 
2  —  1  0  E  | 

2-m  (94) 


That  is,  there  are  L  sets  of  rows  formed,  and  every  row  has  a  "one"  in  the  last,  or 
M+l*k,  position.  The  first  L-l  sets  of  rows  each  contain  the  set  of  all  2M  M-tuples 
in  the  first  M  positions.  The  last  set  contains  all  2^-1  nonzero  M-tuples  in  the  first 
M  positions. 

Now  suppose  that  the  rows  in  (94)  are  the  last  rows  in  a  set  of  L2^-l  parity 
triangles.  The  complete  parity  triangles  then  must  be 

1 

0  1 

'001 

1 

1  1 
0  1  1 
1 

0  1 
1  0  1 
1 

1  1 
1  1  1 
1 

1  1 
0  11 
1 

0  1 
1  0  1 
1 

1  1 
1  1  1 
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and  these  are  the  parity  triangles  corresponding  to  the  matrix  of  a  code  with  rate 
R  =  l/nQ,  where  nQ  =  L2^.  We  now  determine  the  number  of  parity  checks  orthogonal 

on  e  ^  which  can  be  formed  for  this  code, 
o 

The  last  rows  in  the  parity  triangles  can  be  added  as  shown  in  (94)  to  produce 

L2^  parity  checks  of  size  two  orthogonal  on  e  That  the  last  rows  can  always  be 

0  M 

so  combined  is  proved  as  follows:  There  are  L-l  sets  of  last  rows  each  having  2  rows 
These  rows  all  have  "ones"  in  the  last  position  and  have  the  set  of  all  2^  M-tuples  in 
the  first  M  positions.  In  each  such  set  of  rows,  for  every  row  beginning  with  a  "one" 
there  is  a  unique  row  beginning  with  a  "zero"  that  is  otherwise  identical.  The  sum 
(modulo-two)  of  each  such  pair  of  rows  corresponds  to  a  parity  check  that  checks  e  ^ 
and  no  other  information  noise  bits.  The  size  of  the  parity  check  is  two  because  two 
parity  checks  are  added  and  this  implies  that  two  parity  noise  bits  are  checked.  Thus 
2^  parity  checks  of  size  two  orthogonal  on  e  ^  can  be  formed  from  each  of  these 
L-l  sets  of  last  rows.  Finally,  there  is  one  set  of  last  rows  all  of  which  have  "ones"  in 
the  last  position  and  have  the  set  of  all  2^-1  nonzero  M-tuples  in  the  first  M  positions. 
The  parity  checks  in  this  set  can  be  combined  as  before,  except  that  because  the  row 
0  0  ...  0  1  is  missing  the  1  0  .  . .  0  1  must  be  used  alone.  This  row  corresponds  to  a 

parity  check  on  e  ^  and  e,//1\  Thus  an  additional  set  of  2®^  1  parity  checks  orthogc- 

(1)  °  M  M-l 

nal  one  can  be  formed  from  this  set  of  last  rows.  In  all,  L2  parity  checks 

°  (l) 

orthogonal  on  e  '  can  be  formed  by  using  only  the  last  rows  of  the  parity  triangles, 
and  e^'  '  is  the  only  information  noise  bit,  exclusive  of  eQv  ,  that  is  checked. 

The  same  process  can  then  be  repeated  for  the  next-to-last  rows  of  the  parity 
triangles.  If  the  last  rows  are  as  given  in  (94),  then  the  next-to-last  rows  are 

0  1 
1  1 


0  1 
1  1 

0  1 
1  1 

1  1  (95) 

and  this  is  another  set  of  rows  of  the  same  type  as  (94)  with  L'  =  2L  and  M'  =  M  -  1. 
Thus  another  set  of  L'2^  =  L2^_*  parity  checks  of  size  two  orthogonal  on  e  ^  can 

be  formed  from  the  next-to-last  rows  of  the  parity  triangle.  The  union  of  this  set  with 
the  set  of  parity  checks  formed  from  the  last  rows  is  a  set  of  2L2M  1  parity  checks 
orthogonal  on  e  since  the  only  information  noise  bit,  checked  by  the  former 

set,  is  distinct  from  the  only  information  noise  bit,  e^  ,  checked  by  the  latter  set. 

This  same  process  can  be  repeated  for  the  third-to-last,  the  fourth-to-last,  etc., 
rows  of  the  parity  triangles,  until  the  first  rows  are  reached.  The  process  is  thus 
performed  a  total  of  M  times,  giving  ML2M_1  parity  checks  of  size  two  in  all.  The 
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first  rows  each  correspond  to  a  parity  check  that  checks  e  ^  and  no  other  information 
noise  bit.  Thus  an  additional  L2^-l  checks  of  size  one  orthogonal  on  e  ^  can  be 
formed  from  these  rows. 

The  total  number  of  parity  checks  orthogonal  on  e  ^  that  can  be  formed  from  the 


parity  triangles  then  is 

J  =  (M)L2M_1  +  (L2M-1)  =  L(M+2)2M_1  -  1  .  (96) 

Using  Eq.  75,  we  find  the  effective  constraint  length  to  be 

nE  =  1  +  2<M)L2M-1  +  (L2M-1)  =  L(M+1)2M  .  (97) 

For  this  code,  we  have  m  =  M  and  n  =  L2^,  hence 

o 

n,  =  (m+l)n  =  L(M-f-l)2^  =  n.-, .  (98) 

A  O 

By  the  corollary  to  Theorem  1,  the  minimum  distance,  d,  must  be  at  least  J  +  1  or 
L(M+2)2'M On  the  other  hand,  from  Eq.  53,  we  find  that 

dDV_  =  4-(n.+n  J  =  L(M+2)2M_1  .  (99) 

avg  2  A  o 


Thus  the  minimum  distance  must  be  exactly  L(M+2)2M‘  *,  since  it  cannot  exceed  the 
average  distance. 

We  summarize  these  results  in  Theorem  13. 

THEOREM  13:  For  any  integers  L  and  M,  there  exists  a  convolutional  code  with 
R  =  l/nQ  and  nQ  =  L2M'  and  with  constraint  length  n^  =  n^  =  L(M+1)2^  which  is  such 
that  the  minimum  distance  d  satisfies 

d  =  d  =  L(M+2)2M_1 , 
avg 

and  so  that  the  code  can  be  completely  orthogonalized. 

We  call  a  convolutional  code  with  d  =  d  a  uniform  code.  It  seems  probable  that 

avg  - - 

there  are  no  uniform  binary  convolutional  codes  with  R  =  l/nQ  other  than  those  given  by 
Theorem  13.  In  Table  III  we  list  the  set  of  codes  with  L  =  1  for  M  up  to  6. 

Table  III.  Some  uniform  binary  convolutional  codes. 


M 

R 

nA 

d 

1 

1/2 

4 

3 

2 

1/4 

12 

8 

3 

1/8 

32 

20 

4 

1/16 

80 

48 

5 

1/32 

192 

112 

6 

1/64 

448 

256 

48 


3.8  "REED -MULLER -LIKE"  CODES 


Another  class  of  convolutional  codes  that  can  be  completely  orthogonalized  will  now 
be  presented.  The  method  of  construction  is  best  understood  through  an  example. 
Consider  the  array 


V2V3 


vlv3 


V1V2 


1 

0 

1 

0 

0 

0 

0 


1 

0 

1 

0 

1 

0 

1 


1 

1 

i  H 


sP 


(100) 


which  is  formed  by  taking  v^,  v2,  and  in  such  a  way  that  the  set  of  3-tuples  in  their 

rows  is  the  set  of  all  nonzero  3-tuples,  and  then  adding  the  all  "one"  column  vector  to  the 

list  and  the  column  vectors  formed  by  taking  all  vector  inner  products  of  v^  v,,,  and  v3 

taken  two  at  a  time.  Again  suppose  these  are  the  last  rows  in  a  set  of  n  -1  parity  tri- 

°  3-2 

angles.  Then,  as  shown  above,  we  can  add  these  rows  to  form  a  set  of  2  =2  parity 


checks  orthogonal  on  e 


(1) 


U) 


2  (1) 
each  of  size  m  =  2  =4.  Except  for  eQ 


the  only  information 

.  The  same  process  can  be  repeated  in  the  second-to-last 
and  third-to-last  rows  of  the  parity  triangles.  The  remaining  rows  of  the  parity  triangle 
are  the  same  as  those  for  the  L  =  1,  M  =  3  uniform  code  in  section  3.7.  Thus,  in  all,  for 


this  code 
,3 


/ 3-2  2  /3\  ^-1  1 

:  \z)2  parity  checks  of  size  2  ,  plus  J2"  parity  checks  of  size  2  ,  plus 


(3\3  q 

0j2  -1  parity  checks  of  size  2  ,  all  orthogonal  on  eQ'  ',  can  be  formed. 

When  this  example  is  generalized  so  that  the  array  contains  all  vector  inner 
products  of  Vj,  v2,  .  .  . ,  vM  taken  K  or  fewer  at  a  time,  we  obtain  Theorem  14. 

THEOREM  14:  For  any  integers  M  and  K  with  the  property  that  KIM,  there  exists 

M  fK 

a  convolutional  code  with  R  =  1/2  and  n^  =  n^ 


‘  E 


■  i  (?  > 


|0  (IK 


so  that 


M-j 


)=0 


(101) 


and  so  that  the  code  can  be  completely  orthogonalized. 

PROOF  14:  The  proof  involves  two  steps.  First,  we  must  show  that  the  number,  J, 
of  parity  checks  orthogonal  on  e  ^  is  one  less  than  the  sum  in  (101).  Second,  we  must 
show  that  the  minimum  distance  of  the  code  is  exactly  equal  to  this  sum. 

We  begin  by  noting  that  if  a  lowermost  row  of  0  0  0  0  0  0  1  is  added  to  the  array 
in  (100),  it  becomes  the  transpose  of  the  generating  matrix  for  a  second-order  (K  =  2) 
Reed-Muller  code.  (This  is  the  reason  for  which  we  have  referred  to  the  codes  in  this 
section  as  "Reed-Muller-like  codes.")  Reed  has  shown^  that,  for  the  generating  matrix 
of  the  K^-order  code,  the  columns  can  be  grouped  into  2^  ^  sets  of  2^  columns  each, 
in  such  a  manner  that  the  sum  of  the  columns  in  each  set  has  a  "one"  only  in  the  first 
row.  (Reed's  proof  is  quite  lengthy  and  will  not  be  reproduced  here.)  Thus  the  rows  in 
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the  transposed  matrix  can  be  grouped  into  2^  ^  sets,  of  2^“  rows  each,  in  such  a 

manner  that  the  sum  of  the  rows  in  each  set  has  a  "one"  only  in  the  first  column.  If  the 

row  0  0  .  .  .  0  1  is  omitted  from  the  transposed  matrix,  one  set  of  rows  will  contain 
TC 

2-1  rows,  and  the  sum  of  these  rows  will  be  1  0  ...  0  1.  Thus,  if  the  rows  in  the 
transposed  matrix,  with  the  row  0  0  ...  0  1  omitted,  are  the  last  rows  in  a  set  of 

parity  triangles,  it  is  possible  to  form  2^  ^  parity  checks  orthogonal  on  e  each  of 

K  °  ( 1 ) 

size  n.  =  2  ,  and  such  that  there  is  a  single  information  noise  bit,  excluding  e  '  ,  that 

is  checked  by  these  2  '  parity  checks. 

The  same  process  can  be  carried  out  on  the  second-to-last,  the  third-to-last,  etc. 
rows  of  the  parity  triangles.  After  rows  have  been  processed,  the  remaining  rows 

reduce  to  the  K-l  order  code.  The  union  of  the  sets  of  orthogonal  parity  checks,  formed 
from  each  set  of  rows  of  the  parity  triangles,  itself  is  orthogonal  on  e  since  there 
is  only  one  information  noise  bit  checked  by  the  set  formed  from  each  set  of  rows,  and 
these  bits  are  all  distinct. 

The  total  number,  J,  of  parity  checks  orthogonal  on  e  ^  that  are  formed  in  this 
manner  is 


J-! 

j=° 


(102) 


which  is  one  less  than  the  sum  in  Eq.  101,  as  claimed. 

It  remains  to  show  that  the  minimum  distance  is  given  exactly  by  (101).  The  mini¬ 
mum  distance  is  at  least  J  +  1,  and  hence  must  be  at  least  as  large  as  the  sum  in  (101). 
It  suffices,  then,  to  show  that  there  is  some  code  word  with  i  ^  =  1  which  has  exactly 

°  (i) 

this  weight.  We  recall  that  there  is  an  initial  code  word,  with  iQ  =  1,  whose  weight 
is  one  plus  the  number  of  "ones"  in  the  code-generating  polynomials.  This  last  number 
is  just  the  number  of  "ones"  in  the  first  columns  of  the  parity  triangles  and,  by  the 
symmetry  of  each  triangle,  this  is  equal  to  the  number  of  "ones"  in  the  last  rows  of  the 
parity  triangles.  Since  the  Reed-Muller  generating  matrix  has  one  more  "one"  than  the 
set  of  last  rows  of  the  parity  triangles  (it  includes  an  extra  column  0  0...  01),  it 
follows  that  there  is  a  code  word  with  i  ^  =  1  that  has  weight  equal  to  the  number  of 
"ones"  in  the  generating  matrix  of  a  K^-order  Reed-Muller  code.  But  any  vector  inner 
product  of  Vj,  Vg,  . . .  Vj^j  taken  j  at  a  time  contains  exactly  2^  ^  "ones."  The  number 
of  "ones"  in  the  generating  matrix  then  is 


|  (M)2M-j,  (103) 

j=0  J 


and  thus  there  is  an  initial  code  word,  with  i  ^  =  1,  having  this  weight.  This  equals 
the  sum  in  Eq.  101,  which  was  to  be  shown. 

In  Table  IV,  we  list  the  Reed-Muller-like  convolutional  codes  for  M  up  to  5.  The 
M,  K  code,  where  K  =  1,  is  the  same  as  the  M,  L  uniform  code  (in  section  3.7),  where 
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There  are  important  differences  between  the  codes  in  this  section  and  the  ordinary 
Reed-Muller  codes  besides  the  obvious  one  that  the  former  are  convolutional  codes  and 
the  latter  are  block  codes.  With  fixed  M,  the  block  length  of  the  Reed-Muller  codes  is 
2^,  and  adding  the  higher  order  vector  products  to  the  generating  matrix  increases  the 
rate  but  reduces  the  minimum  distance.  On  the  other  hand,  for  the  Reed-Muller-like 
convolutional  codes,  it  is  the  rate  R  =  1/2^  that  is  fixed,  and  adding  the  higher  order 
vector  inner  products  increases  both  the  constraint  length  and  the  minimum  distance. 


Table  IV.  Reed-Muller-like  convolutional  codes. 


M 

K 

R 

nA  “  nE 

d 

1 

1 

1/2 

4 

3 

2 

1 

1/4 

12 

8 

2 

16 

9 

3 

1 

1/8 

32 

20 

2 

56 

26 

3 

64 

27 

4 

1 

1/16 

80 

48 

2 

176 

72 

3 

240 

80 

4 

256 

81 

5 

1 

1/32 

192 

112 

2 

512 

192 

3 

832 

232 

4 

992 

242 

5 

1024 

243 

3.9  SUMMARY 

We  have  constructed  a  fairly  extensive  set  of  codes,  suitable  for  threshold  decoding, 
by  both  cut-and-try  and  analytical  procedures.  For  the  case  R  =  1/2,  we  have  been  able 
to  show  that  the  effective  constraint  length  must  grow  as  the  square  of  the  number  of 
constructed  orthogonal  parity  checks,  and  to  form  codes  that  achieved  this  bound.  For 
other  rates,  we  have  not  been  able  to  derive  a  good  lower  bound  on  effective  constraint 
length,  but  we  conjecture  that  a  similar  bound  applies  as  the  number  of  orthogonal 
parity  checks  increases  indefinitely. 

The  upper  bound  of  Theorem  10  was  found  to  give  the  smallest  possible  value  of  n^, 
for  R  =  1/2,  but  not  for  lower  rates.  For  example,  in  the  limit  as  M  -*■  <®,  the  uniform 
codes  with  L.  =  1  have  an  effective  constraint  length  that  is  approximately  2/M  times  the 
bound  of  Theorem  10,  but  these  are  very  low-rate  codes.  The  effective  constraint  length 
of  the  Reed-Muller-like  codes  is  also  much  smaller  than  the  bound  of  Theorem  10,  for 
the  very  low-rate  codes. 
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The  construction  used  to  prove  Theorem  10  gave  codes  for  which  the  effective 
constraint  length,  n^,  was  much  smaller  than  the  actual  constraint  length,  n^.  On  the 
other  hand,  the  trial-and-error  codes  were  formed  so  that  n^,  and  n^  were  approxi¬ 
mately  equal.  For  both  the  uniform  codes,  and  the  Reed-Muller-like  codes,  m-,  and  n. 

1 i  A 

were  identical.  We  have  already  emphasized  the  importance  of  having  n^  as  small  as 
possible. 

A  final  remark  on  the  trial-and-error  codes  of  Table  II  seems  to  be  in  order. 
These  codes  were  all  hand-constructed,  by  using  techniques  of  the  nature  described  in 
section  3.6.  For  several  reasons,  it  does  not  seem  feasible  to  make  a  computer  search 
for  such  codes.  The  special  technique  by  which  each  code  was  constructed  was  developed 
after  a  careful  study  of  the  appropriate  parity  triangles,  and  no  systematic  set  of  rules 
was  found  which  would  enable  the  construction  of  all  codes  in  Table  II,  or  even  of  all  of 
the  codes  of  a  fixed  rate.  Many  of  the  codes  required  somewhat  "ingenious"  tricks  in 
their  construction  which  would  not  be  readily  programmable.  At  best,  it  seems  that  a 
computer  could  be  used  for  making  searches  for  some  of  the  longer  codes  with  a  high 
degree  of  operator  control  of  the  program. 

In  Section  IV  we  shall  give  threshold-decoding  circuits  that  can  be  used  with  any  of 
the  codes  in  Section  III.  Following  that,  we  shall  give  numerical  and  analytical  results 
for  the  error  probabilities  that  can  be  obtained  when  the  codes  in  Section  III  are 
threshold-decoded. 
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IV.  THRESHOLD-DECODING  CIRCUITS  FOR  BINARY  CONVOLUTIONAL  CODES 


We  shall  present  digital  circuits  that  can  be  used  to  instrument  the  threshold¬ 
decoding  algorithms  for  binary  convolutional  codes. 

It  is  convenient  at  this  point  to  state  and  prove  a  lemma  that  will  be  used  frequently 
in  the  following  sections. 

LEMMA  2:  Let  e^,  e2,  ...  e^  be  a  set  of  n.  statistically  independent,  binary - 

i 

valued  random  variables  for  which  Pr[e^.=l]=  1  -  Pr[e.=0]=  y^.  Then  the  probability, 
p.,  that  an  odd  number  of  these  random  variables  are  "ones"  is 


Pi=l 


n. 

1 


1  +  n  (.1-27,) 

-  3=1  3 . 


(104) 


2  7 

PROOF:  The  technique  of  "enumerating,  or  generating,  functions"  is  useful  here. 
The  enumerating  function,  g^(s),  for  the  j**1  random  variable  is  y^s  +  (l-y^),  that  is,  it 
is  the  polynomial  in  s  for  which  the  coefficient  of  sV  is  the  probability  that  the  random 
variable  takes  on  value  v.  Since  the  random  variables  are  statistically  independent, 
the  enumerating  function,  g(s),  of  their  sum  as  real  numbers  is 


g(s)  =  II  g,(s)  =  II  [p-LS+d-y.)] .  (105) 

3=1  3  3=1  3  3 

The  desired  probability,  p^,  is  then  the  sum  of  the  coefficients  of  odd  powers  of  s  in 
g(s),  hence 

pi  [g(l)-g(-D] .  (106) 

Substituting  (105)  in  (106),  and  noting  that  g(l)  =  1,  we  obtain  Eq.  104. 

Before  proceeding  to  a  specification  of  actual  decoding  circuits,  we  shall  first 
restate  the  threshold-decoding  algorithms  of  Theorems  3  and  4  in  their  specific  form 

for  binary  convolutional  codes  with  rate  R  =  1/n  .  For  this  case,  given  a  set  {A.}  of  J 

(1)  0  1 
parity  checks  orthogonal  on  e  ,  the  threshold-decoding  algorithms  can  be  stated  thus: 

(1)  0 
Choose  eQ  =  1  if,  and  only  if, 

3 

Yj  WjAj  >  T  .  (107) 

i=l 

Here, 

(a)  {A.}  are  treated  as  real  numbers  in  this  sum; 

(b)  w^  are  the  weighting  factors  and  are  given  by 

Wj  =  1  for  majority  decoding,  and  (108) 

q. 

w.  =  2  log— i-  for  APP  decoding;  (109) 

1  P* 
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p.  =  1  -  q.  is  the  probability  of  an  odd  number  of  ones  in  the  n.  noise  bits,  exclusive 

1  (1)  1  1 
of  e  ,  which  are  checked  by  A.:  and 
o’  J  1 

(c)  T  is  the  threshold  and  is  given  by 


T  = 


for  majority  decoding,  and 


(110) 


1 

T  = -5"  Y  w.  for  APP  decoding,  (111) 

*  i=0  1 

where  p  =  1  -  q  =  Prfe  ^=l],  and  we  set  w  =  2  log—. 

*o  o  1  o  1  o  °  p 

1 


This  algorithm  is  just  a  rewording  of  the  decoding  rules  of  Theorems  3  and  4,  and 

we  shall  now  consider  specific  circuits  for  its  implementation.  For  convenience,  we 

shall  restrict  the  discussion  to  rates  R  =  l/nQ,  and  indicate  afterwards  how  the  general 

case,  R  =  k  In  ,  is  handled, 
o'  o 

4.  1  DECODING  FOR  THE  BINARY  SYMMETRIC  CHANNEL 

We  begin  with  the  simplest  case  for  which  the  noise  probability  is  the  same  for  all 
received  bits,  that  is,  Pr(eu^=l)  =  pQ  for  all  u  and  j.  The  only  channel  that  meets 
this  specification  is  the  binary  symmetric  channel  discussed  in  Section  I.  The  decoding 
circuits  for  APP  decoding  are  especially  simple  in  this  case  because  the  weighting 
factors  {w.}  and  the  threshold  T  are  all  constants.  (They  are,  of  course,  always 
constants  for  majority  decoding,  and  hence  the  circuits  presented  in  this  section  can 
be  used  for  majority  decoding  with  any  binary  channel.)  For  example,  consider  a  par¬ 
ticular  parity  check.  A.,  of  size  n..  Applying  Lemma  2,  we  have 

Pi^-^riH1-^]  (112) 

and  this  depends  only  on  n.,  since  pQ  is  a  constant.  Thus  the  {w^}  of  (109)  and  the  T 
of  (111)  are  all  constants, 
a.  Type  I  Decoder 

The  first  circuit  that  we  shall  present  for  implementing  (107)  is  shown  in  Fig.  11. 

That  this  circuit  is  a  proper  threshold  decoder  can  be  seen  in  the  following  manner. 

The  parity-check  sequences,  S^(D)  j  =  2,  3,  ...  n  ,  are  first  formed  by  encoding 

(1)  ° 

the  received  information  sequence,  R  (D),  and  adding  the  parity  sequences  so  formed 
to  the  received  parity  sequences,  R^(D)  j  =  2,  3,  ...  nQ,  as  explained  in  section  2.  7. 
(Binary  addition  and  subtraction  are  identical,  since  1+1  =  0  implies  that  1  =  -1.)  The 
nQ  -  1  parity-check  sequences  are  then  stored  in  as  many  shift  registers.  The  outputs 
of  the  storage,  devices  at  time  m,  when  the  decision  on  e  ^  is  to  be  made,  are  as 
shown  in  Fig.  11. 

The  {A.}  are  then  formed  by  the  set  of  J  modulo-two  adders  below  the  shift  regis¬ 
ters  in  Fig.  11.  The  inputs  to  the  i**1  adder  are  just  the  set  of  parity  checks  s  ^  that 


54 


□o© 


Fig.  11.  Type  I  decoder. 


are  added  to  form  A^.  The  outputs  of  these  J  adders  are  then  treated  as  real  numbers 

and  weighted  by  the  factors  w^  of  (108)  or  (109).  This  weighted  sum  is  then  compared 

with  the  constant  T  given  by  (110)  or  (111).  Thus  the  output  of  the  threshold  device  is 

e  the  decoded  estimate  of  e  ^  .  This  output  is  then  added  modulo-two  to  r 

o  o  o 

which  at  time  m  is  just  emerging  from  the  delayed  input  terminal  of  the  encoder,  to 

(1)*  (1) 
produce  i  '  ,  the  decoded  estimate  of  i  . 

Finally,  consider  the  altered  parity-check  sequences  given  by 


S^*(D)  =  SVJ,(D)  - 


(1)*1  g^(D)  . 


(1)  _ 


(113) 

(l)* 


From  Eq.  63,  it  follows  that  if  decoding  has  been  correct,  that  is,  if  e  _ 

/ 1  \  o  o 

then  the  effect  of  e  '  has  been  removed  from  the  altered  parity-check  sequences 
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Moreover,  terms  from  time  1  through  time  m+1  have  exactly  the  same  structure  with 

respect  to  e,^  as  the  original  terms  from  time  0  through  time  m  had  with  respect  to 

(1)  1  (l) 

eQ  .  The  decoding  for  e^  can  then  be  performed  by  using  exactly  the  same  algo¬ 
rithm  on  the  altered  parity-check  sequence  from  time  1  through  time  m+1  as  was  used 
in  decoding  e  ^  from  the  original  parity-check  sequences  from  time  0  through  time  m. 
Thus,  barring  a  decoding  error,  the  Type  I  decoder  will  continue  to  operate  correctly 
at  successive  time  instants  when  the  parity-check  sequences  are  modified  according  to 
Eq.  113.  This  is  seen  to  be  accomplished  in  Fig.  11  by  feeding  back  the  output,  e  '  , 

of  the  threshold  element  to  the  modulo-two  adders  between  stages  of  the  shift  registers 
that  store  the  parity-check  sequences.  The  connections  to  the  adders  correspond  to  the 
code-generating  polynomials,  G^(D). 

A  few  more  remarks  concerning  the  Type  I  decoder  are  in  order.  First,  the  shift 
registers  used  to  store  the  parity-check  sequence  contain  (nQ-ko)m  =  (n^-nQ)(l-R)  stages 
in  all.  The  encoder  section  contains  an  additional  (n^-nQ)R  stages.  Thus  there  is  a 
total  of  n^-n  stages  of  shift  register  in  the  Type  I  decoder.  Second,  since  the  all¬ 
zero  sequences  form  a  legitimate  initial  code  word  in  any  convolutional  code,  it  follows 
that  the  received  sequences  can  be  fed  into  the  decoder,  beginning  at  time  zero,  without 
any  need  to  disable  the  threshold  decision  element  until  time  m  when  the  decision  on 
e  ^  is  to  be  made.  On  the  other  hand,  the  information  symbols  output  from  the  decoder 
up  to  time  m  must  all  be  zeros  if  the  decoding  is  correct.  The  decoder  output  should 
then  be  monitored  over  this  time  span,  and  any  "one"  output  taken  as  indication  of  an 
error  pattern  that  is  likely  to  cause  e  ^  to  be  decoded  incorrectly, 
b.  Type  II  Decoder 

A  second  circuit  that  implements  the  threshold-decoding  algorithms  is  shown  in 
Fig.  12.  Since  this  circuit  applies  the  algorithms  in  a  form  that  is  quite  different  from 
that  given  above,  it  will  be  necessary  at  this  point  to  present  the  theory  behind  the 
circuit  in  Fig.  12. 

Each  one  of  a  set  {A,}  of  parity  checks  orthogonal  on  e  ^  can  be  written  as  a  sum 

1  (1)  ° 
of  noise  bits,  one  of  which  is  eQ  ,  that  is, 

A  =  e  u;  +  Z  ea  J  *  i  =  1,2...  .  J.  (114) 

j=l  j 

Since  A^  is  a  parity  check,  (114)  can  also  be  written 


(1)  v  'Pf 

*i“  V  +  l  ra.  J  *  i  =  i,  2,  .  .  .  J  . 
3=1  3 


(115) 

(1) 


We  now  define  the  quantity  Eh  to  be  the  sum  of  the  received  bits,  excluding  r  '  , 
which  appear  in  the  expression  for  A^,  that  is. 
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:i  <p.) 


Bi=  Z  ra.  ^  >  i  =  1,  2,  .  ,  .  J  . 
j=l  .1 


(116) 


Recalling  that  =  iQ^  +  e  ^  and  substituting  (114)  and  (115)  in  (116),  we  obtain 


l  o  t-i  a . 

]=1  3 


i  =  1,  2,  .  .  .J  , 


For  convenience,  we  define  B  as 

o 


B  =  r 
o  o 


(1) 


i  +e  (1). 
o  o 


(117) 


(118) 


From  Eqs.  118  and  117,  it  can  be  seen  that  the  {B^}  form  a  set  of  j  +  1  equations  (but 

not  parity  checks)  each  of  which  is  the  sum  of  i  ^  and  a  set  of  noise  bits  that 'are  such 
that  no  noise  bit  enters  into  more  than  one  equation.  By  a  development  exactly  parallel 
to  that  of  section  1. 1,  we  find  that  the  threshold  decoding  rules  become: 

Choose  i  ^  =  1  if,  and  only  if. 


Z  wiBi  > T  > 


(119) 


i=0 


where  the  {B^}  are  treated  as  real  numbers  in  this  sum,  and  the  {wj-  and  T  have  the 
same  meanings  as  in  Eqs.  108-111.  When  J  is  odd,  the  majority  decoding  rule  given 
here  is  not  exactly  the  same  as  that  of  (107).  For  a  full  equivalence  in  this  instance, 
the  rules  of  (119)  should  be  stated:  Choose 


(1) 


(1) 


when  = 

i=0  1 


J  +  1 


This  algorithm  performs  the  same  decoding  operation  as  the  algorithm  stated  in 


(107),  the  only  difference  being  that  the  former  gives  i 


<D* 


,  (l) 


(l)* 


(the  decoded  estimate  of 


from  which  the  decoded  estimate  of  i 


,  (1) 


i  '  )  directly,  whereas  the  latter  gives  e 

°  (D*  (1)  (1)*  ° 

can  be  found  as  i  =  r  '  '  -  e  '  . 

o  o  o 

The  operation  of  the  Type  II  decoder  shown  in  Fig.  12  can  now  be  readily  explained. 
The  received  sequences  are  stored  in  nQ  shift  registers  so  that  the  received  bits  are 
available  for  the  formation  of  the  {B^}.  The  {B.}  are  formed  by  the  set  of  adders 
beneath  the  shift  registers,  the  inputs  to  the  i**1  such  adder  being  the  set  of  n.  received 
bits  in  Eq.  116  whose  sum  is  B^.  The  adder  outputs  are  then  the  {B.},  and  these  are 
weighted  and  compared  with  the  threshold  in  accordance  with  (119).  The  output  of  the 


threshold  element  is  then  i 


(l)* 


l 


Finally,  the  circuit  is  prepared  to  operate  correctly 

at  successive  time  instants  by  altering  the  received  sequences  according  to  Eq.  65. 

(1  )# 

This  is  accomplished  by  feeding  back  the  output,  i  '  ,  of  the  threshold  device  to  the 

modulo-two  adders  between  stages  of  the  shift  register,  with  the  connections  to  the 
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adders  corresponding  to  the  code-generating  polynomials  G^(D). 

The  Type  II  decoder  contains  nQ  shift  registers  of  m  stages  each  for  a  total  of 
mnQ  =  n^  -  nQ  stages  of  shift  register.  This  is  the  same  total  number  as  for  the  Type  I 
decoder.  Moreover,  as  for  the  Type  I  decoder,  the  fact  that  the  all-zero  sequences  are 
valid  code  words  means  that  the  threshold  element  need  not  be  disabled  from  time  zero 
when  the  inputs  are  first  applied  to  time  m  when  the  decision  on  i  ^  is  made.  How¬ 
ever,  unlike  the  Type  I  decoder,  the  Type  II  decoder  cannot  also  serve  as  an  encoder 
without  modification, 
c .  Summary 

The  material  of  the  preceding  discussions  can  be  summarized  in  a  theorem. 
THEOREM  15:  A  binary  convolutional  code  with  rate  R  =  l/nQ  and  constraint  length 
n^  can  be  majority-decoded  for  any  binary  output  channel,  or  APP-decoded  for  the 
binary  symmetric  channel,  by  a  sequential  network  containing  n^  -  nQ  stages  of  shift 
register  and  one  threshold  logical  element. 

It  is  interesting  to  note  that  the  components  in  both  Type  I  and  Type  II  decoders  work 
at  the  rate  of  one  operation  per  time  unit,  whereas  the  received  bit  rate  is  nQ  bits  per 
time  unit.  This  fact  permits  the  use  of  lower  speed  components  in  construction  of  the 
decoders. 

Also,  it  seems  plausible  that  these  decoders  contain  the  minimum  storage  possible 

for  a  convolutional  decoder.  This  can  be  shown  in  the  following  manner.  Since  n . 

(1) 

received  bits  are  required  for  making  the  decision  on  eQ  and  nQ  bits  are  received  at 

any  time  instant,  at  least  nA  -n  received  bits  (or  their  equivalents)  must  be  stored  in 

A  o 

the  decoder. 

Finally,  it  should  now  be  clear  that  for  the  case  R  =  kQ/ nQ  the  Types  I  and  II 

decoders  would  be  modified  to  include  a  total  of  k  threshold  devices,  each  with  its  own 

O 

set  of  adders  to  form  the  set  of  parity  checks  orthogonal  on  e  for  j  =  1,  2,  .  .  .  k  . 

(i)*  °  ° 

The  outputs  of  the  threshold  elements  would  then  be  eQ  J  in  the  Type  I  decoder,  or 

i  in  the  Type  II  decoder.  Thus  for  the  case  R  =  k  In  ,  Theorem  15  would  be 
o  J  r  o'  o 

unchanged,  except  that  the  last  phrase  would  read  "kQ  threshold  logical  elements." 

4.2  DECODING  FOR  TIME-VARIANT  CHANNELS 

We  now  consider  the  case  in  which  the  received  bits  do  not  all  have  the  same  error 
probability,  but  these  probabilities  are  known  at  the  receiver.  In  other  words,  the 
quantities  Pr[e  ^=l]  are  known,  but  may  vary  with  u  and  j. 

An  example  of  such  a  channel  would  be  one  that  adds  a  voltage  pulse  whose  ampli¬ 
tude  is  Gaussianly  distributed  to  the  transmitted  waveform,  which  is  assumed  to  be 
either  a  pulse  of  +\IS  volts  for  a  "one"  or  a  pulse  of  -*/S  volts  for  a  "zero."  The 
receiver  then  uses  the  polarity  of  the  received  pulse  to  assign  to  the  received  bit  the 
more  probable  value  of  the  binary  number  transmitted.  The  amplitude  of  the  received 
pulse  can  be  used  to  compute  the  probability  that  this  assignment  was  wrong.  (See 
sec.  5.  3  for  more  details  concerning  this  channel.) 
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Another  example  is  the  binary  erasure 
channel  shown  in.  Fig.  13.  When  a  "one"  or 
"zero"  is  transmitted,  it  is  received  correctly 
with  probability  q  and  is  erased  with  proba¬ 
bility  p.  Then,  at  the  receiver,  any  received 
"one"  or  "zero"  has  zero  error  probability, 
whereas  any  erased  symbol  can  be  assigned 
as  a  "zero"  and  has  error  probability  one-half. 
(See  sec.  5.  2  for  more  details  concerning  this 
channel.) 

We  shall  now  show  how  APP  decoding  can  be  instrumented  for  the  class  of  time- 
variant  binary-output  channels, 
a.  Weighting  Factors 

The  application  of  APP  decoding  to  time-variant  channels  requires  the  calculation 
of  the  weighting  factors  {w.}  and  the  threshold  T  of  Eqs.  109  and  111,  and  these 
quantities  are  now  functions  of  time. 


Fig.  13.  Binary  Erasure  Channel. 


Consider  the  set  {A^}  of  parity  checks  orthogonal 

<fU 


on  e 


(1) 


Let  be  the  error 


probability  of  e 


•*  ,  the  noise  bit,  exclusive  of  e  ^ 


,  which  is  checked  by  a  par¬ 


ticular  A..  Then,  from  Lemma  2,  we  have 


pi  4 


1-  n  (l-2y) 

.  j-1  3 


=  !  -  q.. 


(120) 


Hence,  it  follows  that 


Pi 


1  +  II  (l-2-y.) 

j=l  3 


1  -  I]  (l-2y.) 
j-1 


(121) 


It  is  more  convenient  to  write  (121)  as  follows.  Let 


(Pi) 

cq  J  =  -  log  (l-27 i>  • 

j  J 


(122) 


Then  (122)  may  be  written 

■s-coth^cay 

1  \  j=i  j  , 


(123) 


where  coth  (x)  =  (ex+e  x)/  (ex-e  x)  is  the  ordinary  hyperbolic  cotangent  function.  Then, 
with  natural  logarithms  used,  the  weighting  factor,  w^  of  Eq.  109  becomes 
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(124) 


v.  =  2  log 
l  Be 


coth 


■2  v^' 

j=l  3  /. 

and  also,  since  we  always  have  pQ  =  Pr^eQ^=l^,  we  obtain 

wo  =  2  lo%e  K^o>  =  2  loge  [coth(Tco(1)).  • 

b.  Analog  Circuit  for  Computing  Weighting  Factors 

A  circuit  that  calculates  the  {wj-  and  the  threshold,  based  on  (124)  and  (125), 


Fig.  14.  Analog  circuit  for  computing  time-variant 
weighting  factors  and  threshold. 


now  be  obtained.  Such  a  circuit  is  shown  in  Fig.  14. 

The  inputs  to  the  circuit  shown  in  Fig.  14  are  assumed  to  be  the  nQ  sequences 

C<3>(D)  =  c  ^  +  c^d  +  c^^d2  +  . . .  j  =  1,  2,  . .  .  n 
O  i.  u  o 


(125) 

can 


(126) 
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(127) 


where  c 

u 


(j) 


is  the  input  on  line  j  at  time  u  and  has  the  value 


(j)  = 


logp 


1-  2Pr 


The  weighting  factors  {w.}  are  computed  (Fig.  14)  as  follows:  The  i1^  analog  adder, 

beneath  the  analog  shift  registers  that  store  the  c  has  .as  inputs  the  set  of  c  ^ 

/  .  \  U  {U  u 

corresponding  to  the  noise  bits  eu'  >  exclusive  of  e  '  ,  which  are  checked  by  A^. 

The  output  of  this  analog  adder  is  fed  to  a  nonlinear  device  that  has  an  output  of 
2  loge  jcoth^-g-xYJ  for  an  input  of  x.  From  (124),  it  follows  that  this  is  the  correct 
weighting  factor  for  parity  check  A..  The  threshold  T  is  formed  by  taking  half  the 
sum  of  all  the  weighting  factors  as  called  for  by  Eq.  111. 

Since  the  analog  circuit  of  Fig.  14  computes  the  correct  set  of  weights  {w^}  and 
the  threshold  T  at  every  time  instant,  it  can  be  combined  with  either  the  Type  I  or 
the  Type  II  decoder  to  give  a  complete  APP  decoding  circuit  for  a  time-variant 
channel.  In  Fig.  15  we  show  the  complete  decoding  circuit  for  the  R  =  1/2,  J  =  4, 


Fig.  15.  Complete  decoding  circuit  for  the  R  =  1/2,  m^,  =  11 
trial- and -error  code. 
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trial-and-error  code  of  Table  I,  in  which  we  have  coupled  a  Type  I  decoder  to  the 
analog  circuit  of  Fig.  14. 

4.3  SUMMARY 

We  have  presented  circuits  that  may  be  used  to  threshold-decode  any  binary  convo¬ 
lutional  code.  •  The  decoding  circuits  are  quite  simple,  containing  what  is  apparently  the 
minimum  possible  number  of  storage  devices  and  using  a  simple  threshold  logical  ele¬ 
ment  as  the  decision  component.  The  decoding  circuits  make  only  one  operation  per 
time  unit  (during  which  nQ  bits  are  received),  and  it  is  certainly  feasible  to  construct 
such  circuits  for  real-time  decoding  on  channels  for  which  the  input  bit  rate  is  in  the 
megacycle  region.  The  fact  that  there  is  no  variance  in  the  decoding  time  per  received 
bit  is  also  desirable,  in  that  it  eliminates  any  queueing  problems  for  the  received  bits. 

All  of  these  features  of  the  decoding  circuits  are  quite  attractive  to  the  communi¬ 
cation  engineer.  The  central  question,  however,  remains  unanswered.  What  error 
probabilities  can  be  obtained  at  the  receiver  ?  Answers  to  this  question  will  be  given  in 
Section  V. 
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V.  PERFORMANCE  DATA  FOR  THRESHOLD  DECODING 
OF  CONVOLUTIONAL  CODES 


We  shall  conclude  our  treatment  of  convolutional  codes  with  a  presentation  of  the 
error  probabilities  that  can  be  attained  by  using  threshold  decoding.  We  shall  consider 
several  communication  channels  and  our  interest  will  always  be  in  the  quantity  Pj(e). 
Pj(e)  was  defined  in  section  2.  5  as  the  average  probability  of  incorrectly  decoding  the 
set  of  first  information  symbols.  Since  nearly  all  of  the  codes  constructed  in  Section  III 

had  rates  of  the  form  R  =  l/n  ,  we  shall  restrict  ourselves  to  such  codes.  In  this  case 

°  (1 ) 

Pj(e)  becomes  simply  the  average  probability  of  incorrectly  decoding  i  '  . 

There  are  two  main  points  concerning  threshold  decoding  of  convolutional  codes  that 
will  be  made  here:  on  the  one  hand,  its  lack  of  generality;  on  the  other  hand,  its  excel¬ 
lent  performance  in  particular  cases.  As  a  specific  example,  recall  the  binary  sym¬ 
metric  channel  of  Fig.  1.  The  random  coding  bound  of  section  2.6  shows  that  when 
R  <  C,  the  average  of  Pj(e)  over  the  ensemble  of  convolutional  codes  approaches  zero 
exponentially  with  the  constraint  length,  n^.  In  sharp  contrast  to  this  result,  we  shall 
show  that  when  threshold  decoding  is  used,  Pj{e)  always  exceeds  some  positive  constant 
no  matter  how  large  n^  becomes  (at  least  when  the  codes  satisfy  the  corollary  of 
Theorem  10).  On  the  other  hand,  we  shall  see  that,  for  codes  of  moderate  length,  the 
performance  of  threshold  decoding  compares  favorably  with  the  random-coding  bound 
and  to  the  error  probabilities  that  can  be  obtained  by  using  other  available  decoding  sys¬ 
tems. 


5.1  THE  BINARY  SYMMETRIC  CHANNEL 

The  major  portion  of  this  section  will  be  devoted  to  the  study  of  Pj(e)  for  the  binary 
symmetric  channel  of  Fig.  1.  We  place  our  emphasis  on  this  channel  for  two  reasons. 
First,  it  is  the  channel  that  has  been  studied  most  thoroughly  by  communication  the¬ 
orists,  and  is  now  familiar  to  all  communication  engineers.  Second,  it  is  reasonable 
to  infer  that  the  performance  of  threshold  decoding  on  this  channel  should  be  typical  of 
its  performance  on  a  broader  class  of  binary  output  channels. 


a.  A  Bound  on  Error  Probability 


We  wish  to  show  that  P^e)  cannot  be  made  to  vanish  by  using  threshold  decoding. 
The  demonstration  will  be  facilitated  by  making  use  of  the  log -likelihood -ratio,  L, 
which  we  define  as 


L  =  loge 


Pr[.0(1,.lUA,}] 


Pr[eo(1,=°|{Ai} 


(128) 


Using  Eqs.  14  and  15,  we  can  write  Eq.  128  as 
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J  Pri 

L=I  l0ge  — 


i=l 


Pr 


Meo<1)=1]  Pr 

+  log  — 


e  ^=1 
o 


A.  I  e  (1)=0 
1  1  o 


Pr! 


e  (1)=0 
o 


(129) 


Finally,  using  Eqs.  18  and  19,  we  can  reduce  Eq.  129  to 
J 


L  =  J  WiA.  -  T, 


(130) 


i=l 


where  the  {A.J  are  treated  as  real  numbers  in  this  sum,  and  the  weighting  factors  {w^} 
and  the  threshold  T  have  the  values  that  were  assigned  in  Eqs.  109  and  111.  Equa¬ 
tion  130  expresses  the  interesting  fact  that  the  log-likelihood-ratio  is  just  the  difference 
between  the  weighted  sum  of  the  orthogonal  parity  checks  and  the  threshold  when  APP 
decoding  is  used. 

The  log-likelihood-ratio  is  a  useful  quantity;  when  L  is  known,  the  error  probability 
in  decoding  iQ'~',  PjfejL),  can  be  determined  from 


PjtelL)  = 


1  +  e 


-  L 


(131) 


Equation  131  follows  directly  from  (128)  when  it  is  noted  that  for  L  >  0  we  have  P,(e|L)  = 
r  / 1  \  r  / 1  \  . .  s\  A 


Pr 


a) 


0  I  {A.}  ,  while  for  L  =  0  we  have  P^e  |  L)  =  Pr  eo^=l|{A^} 


We  are  now  in  a  position  to  prove  that  when  the  codes  are  constructed  to  satisfy  the 
corollary  of  Theorem  10,  the  probability  of  error  cannot  be  made  arbitrarily  small  if 
threshold  decoding  is  used. 

THEOREM  16:  Given  a  binary  convolutional  code  with  R  =  l/n  for  which  a  set  {A.} 

( 1 )  0  i  1 

of  J  =  I(nQ-l)  parity  checks  orthogonal  on  eQ  are  formed.  Suppose  that  nQ  -  1  of  these 
parity  checks  have  n.  =  j  for  j  =  1,  2, 
decoding  satisfies 

,  /p  \  f2p  +n  -l] 


I;  then  for  any  J,  P^e)  obtained  by  threshold 


(132) 


when  the  code  is  used  on  a  binary  symmetric  channel  with  transition  probability  pQ  = 

1  -qo- 

PROOF  16:  For  threshold  decoding,  Pj(e)  is  a  minimum  when  APP  decoding  is 
used.  Moreover,  P^e)  must  monotonically  decrease  as  J  increases,  since  APP 
decoding  always  makes  best  use  of  the  information  in  the  set  {A^}  and  the  set  {A^}, 
for  a  longer  code  constructed  in  accordance  with  the  theorem  always  includes  the 
set  {A.}  of  each  smaller  code  as  a  subset.  Thus  we  need  to  show  that  as  J  —  oo, 
Pj(e)  satisfies  (132)  when  APP  decoding  is  used.  We  begin  by  showing  that  |l| 
is  bounded. 

Using  Eq.  Ill,  we  may  write  Eq.  132  as 
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(133) 


u  J 

L=  I  wiAi~rI  wi 


i=l 


i=0 


from  which  it  follows  that  |L|max  is  obtained  when  all  of  the  {A^}  are  "zeros,"  that  is, 
when  all  of  the  orthogonal  parity  checks  are  satisfied.  Thus 

J  J 


lLlmax  =  iI  wi=  Z  l0geF- 

i=0  i=0 

Substituting  Eq.  112  in  (134),  we  obtain 


(134) 


v1  i  +  i-zp  q 

L  =  (n  - 1 )  /  log  - -  r  +  log  —2  . 

max  o  L  _  (J_  i  ge  p 

i=l  *o 

Using  the  series  expansion  for  the  logarithm, 

loge  f^x1"  2(x+lx3+Fx5+' ••*)  °=x<l, 

we  can  write  (135)  as 

00 

J-*-oo  *0  .  ,  L  -j 


(135) 


(136) 


(137) 


Since  the  series  in  brackets  converges  absolutely,  the  summation  on  each  term  may 
be  carried  out  separately  as  a  geometric  series  and  gives 


lim  I L 
J-»oo 


logo-^+(n  -1)(2) 


max  °e  pn  o 


!-2po)  L  (1  -2P0)-  . 

-  +T,  “  “~3  + 


1  -  (l-2po>  1  -  (l-2po) 


(138) 


We  can  overbound  the  series  on  the  right  to  give 

q, 


n  -  1 

lim  L  <  log  —  + -  (2) 

j  max  6ep„  ,  , 

J— oo  *0  1  -  (I-2p  ) 


(l-2po)+I(l-2po)3+. 


(139) 


We  recognize  the  series  on  the  right  as  the  expansion  of 

1  +  ( l-2p  )  q 

log  -  =  log  — -, 

6  1  -  (l-2p)  e  Po 


(140) 


and  hence  Eq.  139  becomes 

2p  +  n  -  1 


lim  L  < 

T  1  1  max 


2P0  l0geP0- 


(141) 


From  Eq.  131,  it  follows  that  Pj(e  |  L)  is  a  minimum  when  |l|  is  a  maximum  and  that 
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this  probability  is  greater  than  e  max.  Substituting  (141),  we  find  that  Pj(e  |  L) 
must  satisfy 


Pj(e|L)  >je 

But  since  Pj(e)  is  the  average  over  all  values  of  L  of  PjfelL),  Pj(e)  must  certainly 
be  greater  than  the  minimum  value  of  P^(e  |  L),  and  hence  greater  than  the  right-hand 
side  of  Eq.  142.  This  is  the  result  stated  in  the  theorem. 

From  Theorem  16  we  see  that  P^(e)  cannot  be  made  to  vanish  by  using  threshold 
decoding  with  the  class  of  codes  that  satisfy  the  corollary  of  Theorem  10.  We  saw  in 
Theorem  12  that  for  R  =  l/2  this  class  of  codes  is  the  best  that  can  be  constructed  for 
threshold  decoding.  Thus  for  R  =  l/2  it  is  impossible  to  make  Pj(e)  arbitrarily  small 
by  threshold  decoding.  We  can  conjecture  that  it  is  also  true  that  for  other  rates,  Pj(e) 
cannot  be  made  arbitrarily  small  when  threshold  decoding  is  used,  but  we  have  not  been 
able  to  prove  this. 

We  have  actually  proved  more  than  Theorem  16  states.  Inequality  (142)  shows  that 
for  the  class  of  codes  considered  in  Theorem  16,  there  is  never  a  decoding  decision 
made  which  has  error  probability  smaller  than  the  right-hand  side  of  (142)  or  (132). 
Even  in  the  most  favorable  case  (when  the  entire  set  of  orthogonal  parity  checks  is  sat¬ 
isfied)  the  probability  of  a  decoding  error  exceeds  the  bound  on  the  right-hand  side  of 
(132).  For  this  reason,  (132)  does  not  give  a  good  lower  bound  on  P^e),  but  it  does  give 
a  good  bound  on  the  minimum  value  of  Pj(e  |L). 

b.  Data  for  the  Trial -and -Error  Codes 


2p  +n  -1  q 
°  °  o 

2P0  geP0 


Having  established  the  lack  of  generality  of  threshold  decoding,  we  turn  next  to  the 
task  of  computing  Pj(e)  in  particular  cases.  For  this  purpose,  we  choose  the  trial -and- 
error  codes  of  Table  III.  These  codes  have  the  properties  that  n^  and  n^,  are  nearly 
equal,  and  that  n^  is  never  greater  than  the  bound  in  Theorem  10. 

Before  expressing  Pj(e)  in  the  form  best  suited  for  the  binary  symmetric  channel, 
we  shall  give  the  general  expression  that  applies  to  any  binary  output  channel,  whether 
time-variant  or  not.  As  usual,  let  p  =  1  -  q  =  Pr(e  ^=l),  p  is  then  a  random  var- 
iable  in  the  case  of  a  time-variant  channel.  The  general  expression  for  Pj(e)  is 


a  Pr 

-  J 

)  w.A.  >  Tie  (1)  =  0 

+  p  Pr 

■  J 

Y  w.A.  ^  TleJ1*  =  1 

Lj  1  1  1  o 

ro 

Lj  ll  1  0 

i=l 

i=l 

in  which  the  bar  indicates  that  the  average  is  to  be  taken  over  all  values  of  the  weighting 
factors,  w.  =  2  log  (q./p^),  i  =  0,  1,  2,  ...  J.  Equation  143  states  the  obvious  fact  that 
an  error  is  committed  when  either  of  the  mutually  exclusive  events  —  that  the  threshold 
is  exceeded  when  e  ^  =  0,  or  is  not  exceeded  when  eQ^=  1  —  occurs.  But  since  the 


67 


{A.}  conditioned  on  eQ^  are  the  complements  of  the  {A^}  conditioned  on  ej' 1  ^  =0,  we  have 


Pr 

1 

II 

i—i 

<L> 

1h 

VII 

< 

■s 

=  Pr 

'  J  J 

)  W.A.  >  )  w.  -  Tie  (1)  =  0 

Lj  11  1  0 

Lj  11  ~  Lj  1  1  o 

i=l 

i=l  i=l 

Hence,  (143)  may  be  rewritten  as 


(144) 


(e 


■  j 

y  w.A.  >  Tie  (1)  =  0 

+  p  Pr 

i 

o 

II 

0) 

"h 

i 

S 

Ali 

«j 

•5 

'rA/^ 

o 

H 

H 

J 

Lj  ll  U  1  1  0 

Li=1  J 

i=l  i=l 

(145) 


and  we  see  that  only  the  probability  distributions  of  the  {A.}  conditioned  on  e  ^  =  0  need 
be  considered.  For  the  binary  symmetric  channel,  the  bar  in  (145)  can  be  ignored 
because  the  weighting  factors  are  constants. 

We  have  seen  (section  1.  Id)  that  the  {aJ-  conditioned  on  e  ^  =  0  are  a  set  of  J  sta¬ 
tistically  independent  random  variables  for  which  Pr(A^  =  l)  =  pi  =  1  -  Pr(A.=0).  We 
define  the  random  variable  V  as 


X.  =  WjA.  i  =  1,  2, ...  J, 


(146) 


where  the  A^  are  treated  as  real  numbers.  Equation  145  may  then  be  written 


r  j 

r' 

r  j  j  ■) 

L  xi>T 

4-  p  Pr 
0 

Z  Wi-T 

_i=l 

_i=l  i=l 

(147) 


where  the  X^  are  a  set  of  J  statistically  independent  random  variables  for  which 
Pr(X.=w.)  =  p.  and  Pr(X.=0)  =  q.. 

The  determination  of  Pj(e)  from  Eq.  147  reduces  to  the  classical  problem  of  calcu¬ 
lating  the  probability  that  a  sum  of  statistically  independent  random  variables  exceeds 
some  fixed  number.  However,  we  have  been  unable  to  obtain  a  closed -form  expression 
for  Pj(e),  or  even  a  good  lower  bound,  for  the  following  reasons.  Since,  by  Eq.  112, 
for  the  binary  symmetric  channel 


pi=  1 "  qi=i[1  -  u-zpo)1*1}  <148> 

the  X^  are  not  equidistributed  in  the  general  case.  Moreover,  since  m  grows  in  direct 
proportion  to  J  (at  least  for  codes  that  satisfy  the  corollary  of  Theorem  10),  p. 
approaches  l/ 2  as  J  increases;  this  means  that  the  weighting  factors  approach  zero 
in  APP  decoding.  For  large  J,  the  distribution  of  the  sum  of  the  X^  is  determined 
almost  entirely  by  the  first  several  X^  with  small  values  of  m.  For  these  reasons,  the 
standard  procedures  for  handling  sums  of  independent  random  variables,  such  as  the 
Chernov  bound,  or  the  Central  Limit  Theorem,  cannot  be  applied. 

A  numerical  evaluation  of  Eq.  147  is  facilitated  by  use  of  enumerating  functions  that 
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were  mentioned  in  section  4.  1.  The  enumerating  function,  g.(s),  for  \i  for  the  binary 
symmetric  channel  is 


w. 

g.(s)  =  PjS  1  +  q.. 


(149) 


Since  the  V  are  statistically  independent,  the  enumerating  function,  g(s),  of  their  sum 
is  just  the  product  of  the  g^(s),  or 

V  /  w- 
g(s)  =  2,  v  1 
i=l  ' 


+  <li 


(150) 


Then,  since  each  coefficient  in  an  enumerating  function  is  the  probability  that  the  random 
variable  is  equal  to  the  exponent  of  s  in  that  term,  (147)  can  be  stated 


Pj(e)  =  qQ[sum  of  coefficients  in  g(s)  of  terms  with  exponents  >  T]  + 

J 


sum  of  coefficients  in  g(s)  of  terms  with  exponents 


=  1  Wi 


i=l 


(151) 


Equation  151  was  used  as  the  basis  for  a  machine  calculation  of  Pj(e)  for  the  trial - 
and-error  codes  of  Table  III.  For  majority  decoding,  since  w^  =  1  for  all  i,  g(s)  in 
(150)  is  an  ordinary  polynomial  with  J  +  1  terms.  The  calculation  of  Pj(e)  is  quite 
simple  in  this  case.  For  APP  decoding,  the  w.  can  all  be  different,  and  g(s)  then 

j  1 

contains  as  many  as  2  terms,  making  machine  calculation  mandatory  in  all  but  the 
simplest  cases. 

In  Figs.  16,  17,  18,  and  19,  we  have  plotted  Pj(e)  versus  n^  for  the  trial -and -error 
codes  with  rates  l/2,  l/3,  l/5,  andl/lO,  respectively.  (These  data  were  all  obtained 
by  machine  calculation,  and  each  figure  required  approximately  three  minutes  of  time 
on  the  IBM  7090  computer  in  the  Computation  Center,  M.  I.  T.)  Five  different  channels 
were  used  with  the  codes  at  each  rate  and  were  chosen  so  as  to  give  a  wide  range  of 
P  j(e). 

Several  features  of  the  data  in  Figs.  16-19  are  immediately  evident.  First,  Pj(e) 
does  not  decrease  exponentially  with  nE  for  either  majority  decoding  or  APP  decoding. 
Pj(e)  does  decrease  monotonically  with  n^,  for  APP  decoding,  for  the  reason  that  was 
stated  at  the  beginning  of  the  proof  of  Theorem  16.  However,  for  majority  decoding, 
Pj(e)  has  a  minimum  for  some  value  of  n^  and  increases  thereafter,  for  the  channels 
with  large  pQ.  The  reason  for  this  is  that  p^  is  almost  l/2  for  the  parity  checks  with 

large  n.  in  the  longer  codes,  but  these  "bad"  parity  checks  are  being  given  the  same 
1  (1) 

weight  in  the  decision  on  eQ  as  the  "good"  parity  checks  with  small  n..  Ultimately 

Pj(e)  would  approach  l/Z  as  n^  increased  indefinitely  for  majority  decoding  with  any 

value  of  p  . 

*o 

As  a  first  basis  for  evaluating  the  performance  of  the  trial -and -error  codes  with 
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Fig.  18.  Performance  of  R  =  l/5  trial -and -error  codes  on  the  Binary  Symmetric 
Channel. 


RANDOM  CODING 
BOUND  p  =  .1100 


trial -and -error  codes  on  the  Binary  Symmetric 


threshold  decoding  used,  we  have  plotted  the  upper  bound  on  Pj(e)  as  given  by  the  random 

coding  bound  of  Eq,  B-10  for  one  channel  in  each  of  Figs.  16-19.  (The  rate  in  each  case 

is  slightly  less  than  R  ^  for  the  channel  chosen.)  We  have  used  n^,  rather  than  n^  when 

plotting  the  random  coding  bound,  for  the  reason  that  n„  is  the  number  of  bits  on  which 
( 1 ) 

the  decision  on  eQ  is  made  with  threshold  decoding  used,  whereas  all  n^  bits  are  used 
in  maximum -likelihood  decoding  as  assumed  in  deriving  P^(e)  in  the  random-coding 
bound.  For  all  four  rates,  Pj(e)  for  the  trial -and -error  codes  equals  Pj(e)  for  the 
random -coding  bound  for  n^  «  70.  For  the  longer  codes,  Pj(e)  obtained  by  threshold 
decoding  of  the  trial -and -error  codes  exceeds  this  upper  bound  to  the  average  error 
probability  of  the  ensemble  of  convolutional  codes,  with  maximum -likelihood  decoding 
assumed. 

Since  the  practical  considerations  pointed  out  in  section  1.  la  generally  make 

maximum -likelihood  decoding  infeasible,  a  more  realistic  criterion  for  evaluating  the 

performance  of  the  trial -and-error  codes  is  the  error  probability  that  can  be  attained 

by  using  other  practical  encoding  and  decoding  procedures.  For  this  comparison,  at 

rates  l/2,  l/ 3 ,  and  l/5  we  have  chosen  the  nearest  Bose-Chaudhuri  codes,  ^  for  which 

28 

the  Peterson  decoding  algorithm  can  be  used  to  correct  any  error  pattern  of  weight 
(d-l)/2  or  less,  where  d  is  the  minimum  distance.  At  rate  l/lO,  we  have  assumed 
the  existence  of  block  codes  for  which  d  is  equal  to  the  average  weight  of  a  nonzero 
code  word.  In  computing  P(e),  the  block  probability  of  error,  we  assumed  that  in  every 
case  an  error  pattern  was  correctable  if  and  only  if  it  had  weight  (d— 1  )/2,  or  less. 
Finally,  since  at  n  =  63  there  were  no  Bose-Chaudhuri  codes  close  to  the  desired  rates, 
we  shortened  the  code  with  rate  just  exceeding  the  desired  rate  by  dropping  information 
symbols  until  the  shortened  code  had  rate  equal  to  or  less  than  the  desired  rate.  In 
Table.V,  we  list  the  complete  set  of  block  codes  that  were  used  for  comparison  with 
the  trial -and -error  codes. 

P(e),  the  block  probability  of  error,  and  P(e)/k,  where  k  is  the  number  of  infor - 
mation  symbols,  is  given  in  Table  VI  for  the  codes  of  Table  V  and  the  trial -and -error 
codes  of  Table  II.  The  same  channel  was  used  at  each  rate  as  was  used  for  the 


Table  V.  Block  codes  used  for  comparison  with  trial -and-error  codes. 


n 

k 

d 

R 

Type  of  Code 

31 

16 

7 

.  517 

Bose -Chaudhuri 

54 

27 

11 

.  500 

"shortened"  Bose-Chaudhuri  (63,36)  code 

31 

11 

11 

.  355 

Bose -Chaudhuri 

58 

19 

15 

.  328 

"shortened"  Bose-Chaudhuri  (63,24)  code 

31 

6 

15 

.  194 

Bose  -Chaudhuri 

58 

11 

23 

.  190 

"shortened"  Bose-Chaudhuri  (63,  16)  code 

30 

3 

17 

.  100 

average  distance  code 

60 

6 

31 

.  100 

average  distance  code 
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random -coding  bound  in  Figs.  16-19,  and  n  was  equated  with  n^. 

Although  no  method  of  comparing  error  probabilities  for  block  decoding  and  convolu¬ 
tional  decoding  is  entirely  satisfactory,  the  best  basis  seems  to  be  comparison  between 
Pj(e)  and  P(e)/k.  The  reasoning  behind  this  choice  is  the  following:  Let  P.  (e)  be  the 
probability  of  making  any  error  in  decoding  the  first  k  received  information  bits.  Then 
if  P^(e)  =  P(e),  the  average  number  of  information  bits  decoded  correctly  before  the 
first  error  is  made  will  be  approximately  the  same  for  convolutional  decoding  and  block 
decoding.  On  the  other  hand,  P^(e)  is  conservatively  bounded  as 

Pk(e)4kP1(e)  (152) 

in  which  we  overbound  the  probability  of  a  union  of  events  by  the  sum  of  the  individual 
probabilities.  Thus  a  comparison  between  P(e)  and  k  P,(e),  or  (that  which  is  the  same) 
between  P(e)/k  and  Pjte),  seems  a  reasonable  choice  for  comparing  block  decoding  and 
convolutional  decoding. 

From  Table  VI  it  can  be  seen  that  threshold  decoding  of  the  trial -and-error  codes 
compares  quite  favorably  with  the  block  decoding  used  for  comparison.  There  is  little 
difference  in  the  performances  at  the  higher  rates.  The  marked  superiority  of  threshold 
decoding  at  R  =  l/lO  stems  from  the  fact  that  at  such  a  low  rate  it  is  important  to  cor¬ 
rect  a  sizeable  percentage  of  error  patterns  with  weight  greater  than  (d-l)/2.  Even 
majority  decoding  will  correctly  decode  e  ^  for  many  such  error  patterns.  For 
example,  in  the  R  =  l/i  0,  J  =  26,  d  =  27  trial-and-error  code,  a  decoding  error  is  made 
when  e  ^  =  0  only  when  14  or  more  of  the  orthogonal  parity  checks  are  "ones."  Thus 
with  a  pattern  of  14  errors  among  the  48  bits  in  the  orthogonal  parity  checks,  an  error 
is  made  only  in  the  unlikely  event  that  each  bit  in  error  is  checked  by  a  different  one 
of  the  26  orthogonal  parity  checks  ^with  ej^=0  assumed^;  and  if  two  of  the  bits  in  error 
are  in  the.  same  parity  check,  no  decoding  error  will  be  made  unless  the  error  pattern 
has  weight  at  least  16. 

c.  Tolerances  for  the  Weighting  Factors  and  Threshold 

The  performance  of  threshold  decoding  with  the  trial-and-error  codes  of  Table  II 
used  is  quite  creditable,  and  the  simple  decoding  circuits  for  implementing  threshold 
decoding  may  render  it  of  considerable  practical  value  for  codes  of  these  lengths.  But, 
whereas  for  the  decoding  circuits  and  the  calculations  for  P^e)  it  is  assumed  that  it  is 
possible  to  set  the  weighting  factors  and  the  threshold  at  precisely  their  correct  values, 
it  is  of  considerable  engineering  import  to  know  what  effect  inaccuracies  in  these  quan¬ 
tities  will  have  on  decoding  performance. 

It  is  easy  to  see  that  tolerance  requirements  are  not  strict.  From  Eq.  130,  it  fol¬ 
lows  that  in  the  critical  instance  in  which  the  weighted  sum  of  the  orthogonal  parity 
checks  is  approximately  equal  to  the  threshold,  L  is  approximately  zero  and  the  error 
probability  of  the  decision  is  approximately  l/2  according  to  Eq.  131.  Thus  it  makes 
little  difference  whether  the  decoder  assigns  e  ^  equal  to  "one"  or  "zero."  In  Fig.  20a 
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Table  VI.  Comparison  of  performance  of  the  trial -and -error 
convolutional  codes  with  the  block  codes  of  Table  V 
on  the  binary  symmetric  channel. 


RATE 

nE  0r  " 

BLOCK  CODES 

P(e)  P(e)/k 

CONVOLUTIONAL  CODES 

P^*) 

1/2 

31 

6.8xl0'4 

4.3xl0‘5 

2.4xl0'5 

54 

7.3xlO'5 

2 . 7xl0"6 

6.6xlO-6 

1/3 

31 

3.4xl0-4 

3.0xl0"5 

3.8xl0"5 

58 

1 . 3xl0-5 

7.0xl0'7 

9.3xl0-6 

1/5 

31 

1.6xl0'4 

2.7xl0"5 

1.9xl0‘5 

58 

4.lxl0-5 

3.7x10  ** 

3.4xl0'6 

1/10 

30 

3.7xl0'3 

1.2xl0'3 

7.7xl0'5 

60 

6.0xl0'4 

l.OxlO"4 

1.3xl0’5 

Pl<e> 

for  the 

convolutional  codes 

was  found 

by  extrapolating  the  data 

given  in  Figs.  16-19  for  Pj(e),  with  the  use  of  APP  decoding  assumed. 
The  transition  probabilities  that  were  used  are: 


pQ  =  .  0130  for  R  =  1/2 
pQ  =  0310  for  R  =  1/3 
pQ  =  .  0530  for  R  =  1/5 
pQ  =  .1100  for  R  =  1/10 
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PER  CENT  INACCURACY  IN  THRESHOLD 
(°) 


ESTIMATE  OF  CHANNEL  TRANSITION  PROBABILITY 
(b) 


Fig.  20.  (a)  Effect  of  inaccuracies  in  the  threshold  setting  on 
the  probability  of  error. 

(b)  Effect  of  inaccuracies  in  estimation  of  the  channel 
on  the  probability  of  error. 
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we  give  numerical  results,  for  a  typical  case,  which  illustrate  the  effect  of  inaccuracies 
in  the  threshold  setting.  Even  a  5  per  cent  maladjustment  does  not  severely  degrade 
performance.  Minor  inaccuracies  in  the  weighting  factors  have  the  same  effect. 

A  more  interesting  question  concerns  the  effect  of  errors  in  estimation  of  the  chan¬ 
nel.  According  to  Eq.  112,  the  weighting  factors,  and  the  threshold  also,  are  set 
according  to  the  estimate  of  p^,  which  we  shall  denote  p  .  Fortunately,  a  very  accurate 
estimate  is  not  required.  This  could  be  inferred  from  Figs.  15-19,  in  which  it  is  clear 
that  even  the  equal  weighting  of  all  parity  checks  prescribed  by  majority  decoding  does 
not  severely  degrade  performance  in  most  instances.  In  Fig.  20b,  we  give  numerical 
results  for  a  typical  case  which  show  the  minor  effect  on  Pj(e)  of  moderate  inaccuracies 
in  the  channel  estimation. 

Finally,  we  observe  that  majority  decoding  can  be  thought  of  as  the  limiting  case 
>}« 

of  APP  decoding  when  pQ  —  0.  For,  consider  two  parity  checks,  one  of  size  n.  =  a,  the 
other  of  size  m  =  b.  The  ratio  of  the  weighting  factors  w&  and  w^  is,  according  to 
Eq.  112, 


% 

As  po  -  0,  both  weighting  factors  approach  infinity,  but  their  ratio  approaches  a  limit 
that  can  be  evaluated  by  two  successive  applications  of  l'Hopital's  rule  and  is  found  to 
be 


w 

lim  — —  =  1 
*  n  wb 

Po“0 


(154) 


independently  of  a  and  b.  In  this  case  all  parity  checks  are  weighted  the  same,  which 
is  the  distinguishing  feature  of  majority  decoding.  Thus  one  expects  the  performance 
of  APP  and  majority  decoding  to  be  nearly  the  same  on  channels  with  small  pQ.  From 
Figs.  16-19,  this  is  indeed  seen  to  be  the  case. 


d.  Modifications  of  Threshold  Decoding 

There  are  two  basic  ways  in  which  the  threshold  decoding  algorithms  can  be  modified 
to  improve  decoding  reliability  on  the  binary  symmetric  channel  (or  any  other  channel) 
equipped  with  a  feedback  provision.  (The  basic  concept  of  this  section  —  improving  reli¬ 
ability  by  decoding  only  the  most  likely  error  patterns  and  requesting  retransmission 

29 

in  all  other  cases  —  was  first  suggested  by  Wozencraft  and  Horstein.  ) 
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The  first  method  is  to  equip  the  threshold  device  with  an  alarm  zone  of  width  2A 

J 

centered  about  the  threshold.  That  is,  whenever  2  w.A.  -  T  <  A,  an  alarm  is  sounded 

i=i  11 

and  a  repeat  of  the  data  is  requested..  From  Eq.  131,  it  follows  that  for  APP  decoding 
the  error -alarm  method  is  equivalent  to  making  a  decoding  decision  when  and  only  when 
the  error  probability  is  less  than  e  /( 1  +e  ). 

The  alarm  zone  is  useful  with  majority  decoding  also.  Specifically,  let  M  be  any 
integer  that  is  such  that  J  -  M  is  an  even  number.  Then  e  ^  will  be  correctly  decoded 

whenever  there  are  J  ~  M ,  or  fewer,  errors  among  the  n-,  symbols  checked  by  the  {A.}, 

*  /  n  J  +  M  1 

and  no  incorrect  decision  on  eQ  '  will  be  made  when  no  more  than  — g — errors  occur 

among  these  symbols,  when  the  decoding  rules  are:  Choose  e  ^  =  1  if  ^  t,  ^  or  more 
of  the  {A.}  have  value  "one";  choose  e  ^  =  0  when  —  2  —  or  less  of  the  {A^}  have  value 
"one";  and  otherwise  signal  the  presence  of  a  detected  error.  The  proof  of  this  state¬ 
ment  is  a  trivial  modification  of  the  proof  of  Theorem  1. 

The  second  method  of  improving  reliability  when  a  feedback  channel  is  available  will 
be  called  random  supplementation  of  the  convolutional  code.  This  method  is  designed 
for  use  in  conjunction  with  the  error-count  technique  described  in  section  3.  2.  The  idea 
behind  this  method  is  to  cause  a  decoding  error  to  propagate  so  that  it  may  subsequently 
be  detected  by  the  error -count  procedure,  and  a  repeat  then  requested.  This  can  be 
accomplished  by  adding  terms  with  degree  greater  than  m  to  the  code -generating  poly¬ 
nomials  of  Eq.  65,  the  coefficients  in  the  extra  terms  being  chosen  independently  as 
"one"  or  "zero"  with  probability  one-half.  (Since  n^  is  increased  in  this  process,  the 
encoder  and  decoder  complexities  are  increased  in  the  same  proportion.)  Suppose  that 
L  additional  terms  are  added  to  each  code -generating  polynomial.  From  Fig.  11  it  can 
be  seen  that  when  a  decoding  error  is  made,  the  effect  of  the  random  supplementation 
is  to  add  a  burst  of  L(n  -1)  random  bits  into  the  parity-check  shift  registers.  As  these 
bits  advance  into  that  section  of  the  parity-check  shift  register  which  is  being  used  for 
decoding,  that  is,  into  the  last  m  stages  of  shift  register  in  each  chain,  the  probability 
is  increased  that  the  error  will  propagate.  (A  similar  analysis  applies  to  the  Type  II 
decoder  shown  in  Fig.  12.)  By  choosing  L  sufficiently  large,  the  probability  of  an  unde¬ 
tected  error  can  be  made  as  small  as  desired.  However,  since  the  number  of  bits  that 
must  be  repeated  grows  as  L,  the  amount  of  time  that  is  spent  in  requesting  repeats 
increases .  (We  shall  not  consider  here  such  important  questions  as  how  the  data  for 

repeats  is  stored,  how  many  bits  are  repeated,  etc.  The  reader  is  referred  to  the  paper 

29 

of  Wozencraft  and  Horstein  for  a  full  treatment  of  these  matters.  Our  purpose  here 
is  merely  to  point  out  those  features  for  implementation  of  feedback  strategies  which 
are  peculiar  to  threshold  decoding.) 

Finally,  in  order  to  implement  the  error-count  procedure,  it  is  necessary  to  have  a 
means  for  counting  all  of  the  errors  corrected  in  the  received  sequences.  The  output  of 
the  threshold  device  in  the  Type  I  decoder  is  a  "one"  each  time  an  error  is  corrected  in 
the  information  sequence.  The  number  of  errors  corrected  in  the  parity  sequences  can 
be  obtained  by  modifying  the  Type  I  decoder  as  shown  in  Fig.  21.  The  code-generating 
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Fig.  21.  Circuit  for  counting  all  errors  in  received  sequences. 

polynomials  are  always  selected  with  g  ^  =  1,  j  =  2,  3,  ...  nQ;  otherwise,  there  is 
an  idle  position  in  the  first  code  word,  that  is,  a  position  that  contains  only  zeros  for 
all  first  code  words.  Thus  from  Eq.  70,  it  follows  that 

so(j>  °  eo(1)  +  eo(j>  j  =  2,3,...no.  (155) 

(i)*  (i)  (it* 

Hence,  e  J  ,  the  decoded  estimate  of  e  can  be  obtained  by  adding  e  '  modulo-two 
, . .  o  o  J  b  o 

to  s  This  is  done  in  Fig.  21;  the  adders  in  this  circuit  then  have  a  "one"  output 

each  time  an  error  is  corrected  in  the  corresponding  parity  sequences. 

5.2  THE  BINARY  ERASURE  CHANNEL 

A  very  powerful  decoding  method  for  convolutional  codes  has  been  developed  by 
30 

Epstein  for  this  channel.  Epstein's  method  results  in  an  exponential  decrease  in  error 
probability  at  any  rate  less  than  channel  capacity  and  with  a  finite  average  number  of 
computations  per  decoded  bit  independent  of  code  length. 

In  this  section  we  shall  very  briefly  treat  APP  decoding  for  the  binary  erasure  chan¬ 
nel  of  Fig.  13.  This  channel  has  capacity  C  =  q  and  is  characterized  by  the  fact  that  a 

31 

received  bit  may  be  ambiguous,  but  can  never  be  in  error.  In  section  4.  2,  we  saw 
that  this  channel  may  be  reduced  to  a  binary  output  channel  by  assigning  the  value  "zero" 
to  all  erased  bits  and  giving  these  bits  error  probability  one -half.  The  expression  for 
Pj(e)  may  be  written  directly  as 

P.(e)  =4-p  Pr  Each  A.  checks  at  least  one  bit  that  has  been 

i  2  l  1  n\ 

erased,  not  including  e  '  .  (156) 

Equation  156  can  be  seen  to  hold  as  follows:  e  ^  can  be  determined  immediately  from 
any  parity  check  in  which  no  other  erased  bit  appears;  on  the  other  hand,  if  there  is 
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another  erased  bit,  such  a  parity  check  gives  no  information  about  e  .  Thus  an  error 

(1 )  0 

will  be  made  with  probability  one -half  when,  and  only  when,  e  corresponds  to  an 

°  (1) 

erased  bit  and  every  one  of  the  parity  checks  orthogonal  on  e  checks  one  or  more 

n  \  ° 

erased  bits  in  addition  to  e  '  .  Equation  156  may  be  written  in  the  form 

Pl(e)=}p5  (iV)  (157) 

in  which  we  use  the  fact  that  no  bit,  except  e  ^  ,  appears  in  more  than  one  of  the  orthog¬ 

onal  parity  checks. 

Using  (157),  we  have  computed  Pj(e)  for  two  binary  erasure  channels  in  Fig.  22. 

Since  the  general  features  are  the  same  as  those  for  the  binary  symmetric  channel,  we 


Fig.  22.  Performance  of  R  =  l/2  trial -and -error  codes  on  the  Binary  Erasure 
Channel* 
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shall  not  discuss  them  further. 

As  was  the  case  for  the  binary  symmetric  channel,  P^e)  cannot  be  made  arbitrarily 
small  when  the  codes  satisfy  the  corollary  of  Theorem  10.  Specifically,  we  have  the 
following  theorem. 

THEOREM  17:  Given  a  binary  convolutional  code  with  R  =  l/n  for  which  a  set  {A.} 

(11  0  1 
of  J  =  I(nQ-l)  parity  checks  orthogonal  on  e  '  are  formed  in  such  a  manner  that  n  -  1 

of  these  checks  have  n.  =  j  for  j  =  1,  2,  ...  I,  for  any  J,  Pj(e)  obtained  by  threshold 

decoding  satisfies 


Pj(e) 


i  M 


4P 


i=l 


(158) 


where 


M  =  [[log2p-log2(no-l)-l]/log2q| 

when  the  code  is  used  on  a  binary  erasure  channel  with  erasure  probability  p  =  1 
PROOF  17:  For  the  codes  in  Theorem  17,  Eq.  157  becomes 

.  I  ,  ..n  -1 

Pj(e)  =fp  JT  (l-ql)  °  , 

and  this  clearly  decreases  monotonically  with  I.  We  also  have 


(159) 
-  q. 


(160) 


i=M 


(l-q1)1"0  1  l  1  -  (nQ-l)  £  q1  =  1  -  (no-l)(qM)/(1  q) 


(161) 


i=M 


M, 


When  M  is  chosen  to  satisfy  (159),  (nQ-l)^q  q^  = —,  and  (161)  becomes 


i=M 


(162) 


From  (162),  it  follows  that  the  product  of  all  of  the  terms  in  (160)  cannot  be  smaller 
than  one-half  the  product  taken  up  to  i  =  M,  and  this  is  the  result  stated  in  the  theorem. 


5.3  THE  GAUSSIAN  CHANNEL 


The  last  channel  that  we  shall  study  here  is  the  Gaussian,  channel  that  was  described 
briefly  in  section  4.2.  The  transmitted  signal  is  a  pulse  of  +  *TS  volts  for  a  "one,"  and 
a  pulse  of  -n/"S  volts  for  a  "zero."  The  received  pulse  differs  from  the  transmitted 
pulse  by  the  addition  of  a  noise  pulse  that  is  gaussianly  distributed  with  zero  mean  and 
variance  N.  Now  we  shall  show  how  the  probability  inputs,  c  to  the  analog  circuit 
of  Fig.  14  can  be  obtained  for  this  channel.  We  shall  also  give  some  indication  of  the 
improvement  in  decoding  performance  that  can  be  realized  by  use  of  the  time-variant 
weighting  factors,  as  opposed  to  constant  weighting  factors  for  this  channel. 

From  section  4.1a,  we  recall  that  the  probability  inputs  to  the  circuit  for  calculating 
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the  weighting  factors  were  defined  as 


cu(j)  =  -loge  [l-EPr(eu(:i)  =  l)] 


(163) 


for  all  u  and  j.  These  probability  inputs  can  be  obtained  as  follows.  When  a  pulse  of 
v  volts  is  received,  the  corresponding  received  bit  is  assigned  as  a  "one"  or  "zero" 
according  as  v  is  positive  or  negative.  The  log-likelihood-ratio,  L^,  for  this  decision  is 


1  „  2N 

j  i  -  Pr  Ki  =°]  ,  sfZnN 

Li  -  toge;.-r:ur =  iog' 


Pr  [euu  -l] 


e  (\fS  +  |v|)' 

1  _  2N 


n/T7n 


and  this  reduces  to 


(164) 


Li 


(165) 


Equation  165  states  that  is  directly  proportional  to  the  magnitude  of  the  received 
voltage.  Using  Eq.  131,  we  obtain 

-L. 

Pr  [eu(3>=l]  =  -S-~  ■  (166) 

1  +  e  1 

Using  (165)  and  (166),  we  can  rewrite  (163)  as 


~l°ga 


-L, 


1  +  e 


=  !°ge 


(167) 


From  Eq.  167,  it  follows  that  the  c  ^  can  be  obtained  by  passing  the  received  pulses 
through  the  circuit  of  Fig.  23.  The  nonlinear  device  in  this  circuit  is  exactly  the  same 
type  as  those  in  the  analog  circuit  of  Fig.  14.  • 

In  principle,  the  computation  of  Pj(e)  for  the  Gaussian  channel  can  be  carried  out  by 


RECEIVED 

PULSES 


,(i) 


NONLINEAR  DEVICE 
Y  y  =  2LOGe  COTH  (x/2) 

Fig.  23.  Circuit  for  computing  c  ^  for  a  Gaussian  channel. 
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using  Eq.  145.  However,  since  the  average  must  be  taken  over  the  distribution  of  the 

weighting  factors,  and  since  the  weighting  factors  are  nonlinear  functions  of  the 

gaussianly  distributed  received  voltages,  the  expression  is  not  analytically  tractable 

in  the  general  case.  There  is  one  simple  case,  however,  for  which  P^e)  can  be  found 

directly,  and  which  will  serve  to  illustrate  how  much  advantage  is  gained  by  using  the 

time-variant  weighting  factors.  This  simple  case  is  the  code  for  which  each  of  the 

n  -  1  parity  symbols  is  a  repeat  of  the  single  information  symbol  in  the  constraint  span; 

such  a  block  code  can  be  considered  as  a  degenerate  case  of  a  convolutional  code.  (A 

thorough  treatment  of  this  channel  including  computations  similar  to  Eqs.  172-174  is 

32 

available  in  a  paper  by  Bloom  et  al.  ) 

In  this  case,  it  is  convenient  to  use  the  as  defined  in  section  4.1b,  for  we  have 

Bi=roW  i=l,2,...no  (168) 

and 

pi  =Pr  [eo(l)  =  l]  i  =  l,2,...no.  (169) 


Using  Eqs.  164,  we  find  that  the  weighting  factors  are 


w.  =  2  log  —  =  4  ^P-|  v.  j 
i  e  Pj^  N  I  i  l 


1,2,.  .  .nQ, 


(170) 


where  v.  is  the  received  pulse  for  bit  r  Then  since  r  ^  is  assigned  as  a  "one" 
when  v.  is  positive,  or  as  a  "zero”  when  v.  is  negative,  the  APP  decoding  rule  of 

l  ( 1)  1 

section  4.1b  becomes:  Choose  i  =1  if,  and  only  if. 


n 


o 


z 


i=l 


v.  >  0. 
i 


(171) 


Without  loss  of  generality,  we  may  assume  that  a  "one"  is  transmitted.  Then  the  sum 
on  the  left  side  of  (171)  is  a  Gaussian  random  variable  with  mean  no'/§"  and  variance 
nQN,  since  it  is  the  sum  of  nQ  statistically  independent  Gaussian  random  variables  with 
mean  ^fS  and  variance  N.  Since  an  error  is  made  whenever  this  sum  is  negative,  we 
have 


p,(e)  =  ; . 1 

n / 2im  N 
o 


(172) 


which  reduces  to 


Pl<6>  "  27 


pN/(nQS)/N 

on 


(173) 
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For  example,  if  n  =  10,  and  if  P^e)  =  1.38  X  10~3,  then  Eq.  173  gives  ^TS/W  =  0.98. 
From  Fig.  19,  we  find  that  the  same  code  (n.g,=10)  gives  the  same  P,(e)  when  used  on  a 
binary  symmetric  channel  with  transition  probability  p  =  0.110.  This  binary  symmetric 
channel  can  be  considered  as  derived  from  a  Gaussian  channel  in  a  manner  such  that 
only  the  polarity  of  the  output  pulse  can  be  observed,  and  such  that 

0.110  f  °  «p(-«2^L2}dX.  ,174) 

nT^tTN  ^  iN  1 

The  solution  of  Eq.  174  gives  \l S/N  =  1.24.  Thus,  we  can  obtain  a  signal-to-noise 
advantage  of  20  log  10  (1  .24/0.98)  =  2.3  db  as  compared  with  the  use  of  constant  weighting 
factors  in  the  example  considered. 

5.4  SUMMARY 

The  most  striking  result  in  this  section  is  a  negative  one,  namely  that  P^(e)  is 
bounded  away  from  zero  when  threshold  decoding  is  employed  with  convolutional  codes 
on  the  binary  symmetric  channel  or  the  binary  erasure  channel,  at  least  when  the  codes 
satisfy  Theorem  10  and  its  corollary.  Since  it  is  impossible  to  construct  better  R  =  |- 
codes  for  threshold  decoding  than  those  that  satisfy  Theorem  10  and  its  corollary,  it  fol¬ 
lows  that  this  result  is  rigorously  true  for  any  code  at  rate  R  =-|-,  and  it  probably  applies 
also  to  all  other  rates.  Not  withstanding  this  failure  of  threshold  decoding  to  exploit 
the  full  potential  of  convolutional  codes,  it  is  clear  from  Table  VI  (and  Figs.  16-19) 
that  the  error  probability  attainable  by  threshold  decoding  of  convolutional  codes  com¬ 
pares  favorably  with  that  of  other  known  decoding  systems  for  codes  of  moderate  length. 
The  simplicity  of  instrumentation,  for  threshold  decoding  may  render  it  of  considerable 
practical  value  for  codes  of  such  length. 

It  was  also  shown  here  how  modification,  such  as  an  alarm  zone  around  the  thresh¬ 
old,  can  be,  made  to  improve  the  reliability  of  threshold  decoding.  Moreover,  from  an 
engineering  point  of  view,  the  important  fact  that  tolerance  requirements  in  the  thresh¬ 
old  element  are  lenient,  was  demonstrated.  This  result  applies  to  the  accuracy  of 
setting  of  the  threshold  and  the  weighting  factors,  and  to  the  accuracy  of  prior  estima¬ 
tion  of  the  channel. 
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VI.  THRESHOLD  DECODING  OF  BLOCK  CODES 


6.  1  INTRODUCTION 

In  the  preceding  sections,  we  have  seen  how  the  concept  of  threshold  decoding  can  be 
applied  to  the  class  of  convolutional  codes.  We  shall  now  study  its  applicability  to  the 
more  familiar  class  of  block  linear  codes.  This  class  of  codes  is  distinguished  from 
the  class  of  convolutional  codes  by  the  fact  that  each  set  of  k  information  symbols  is 
independently  encoded  into  n  symbols  for  transmission.  Hereafter,  we  use  the  notation, 
an  (n,  k)  code,  to  mean  a  linear  block  code  in  systematic  form  (systematic  form  implies, 
as  explained  in  section  1.1,  that  the  first  k  transmitted  symbols  are  identical  to  the 
information  symbols). 

According  to  Eq.  2,  the  parity  symbols,  t^+1,  t^.,,  ...  t  ,  are  determined  from 
the  information  symbols,  tj,  t,,,  ...  t^,  by  the  following  set  of  linear  equations 


k 

t.  =  )  c..t.  j  =  k+1,  k+2,  .  .  .n 

j  L  31 1 

i=l 


(175) 


which  may  be  represented  in  matrix  form  as 
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where  [p]  is  the  (n-k)xk  matrix  of  coefficients  [c-J. 
parity  checks  are  given  by  the  matrix  equation 
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It  now  follows  from  Eq.  5  that  the 


Sk+1 

ei 

Sk+2 

e2 

=  [P=-I] 

• 

s 

e 

n 

n 

where  [ij  is  the  identity  matrix  of  dimension  n-k. 
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The  matrix 


H  =  [P:-I]  (178) 

is  called  the  parity-check  matrix  of  the  code.  From  (177),  it  can  be  seen  that  each  row 
of  H  gives  the  coefficients  of  the  noise  digits  that  appear  in  one  of  the  parity  checks. 

We  call  e  j,  e 2,  ..  .  .  e^  the  information  noise  digits,  and  we  call  e^+,,,  '  •  •  en  t*le 

parity  noise  digits.  Then  from  (177)  it  can  be  seen  that  the  rows  of  [P]  give  the  coef¬ 
ficients  of  the  information  noise  digits  appearing  in  the  parity  checks;  moreover,  each 
parity  noise  digit  is  checked  by  one  and  only  one  parity  check.  It  should  be  clear,  by 
reference  to  the  material  in  section  3.2,  that  the  task  of  forming  a  set  of  parity  checks 

orthogonal  on  e.  reduces  to  finding  disjoint  sets  of  rows  of  [P]  for  which  some  linear 

J  th 

combination  of  the  rows  in  each  such  set  has  a  "one"  in  the  j  position,  but  no  other 

position  has  a  nonzero  entry  in  more  than  one  of  these  linear  combinations  of  twos. 

For  example,  consider  the  binary  (6,3)  code  for  which  the  parity  check  matrix  is 


0 

1 

1 

1 

0 

0 

1 

0 

1 

0 

1 

0 

1 

1 

0 

0 

0 

1_ 

(179) 


The  first  row  of  H  gives  the  coefficients  of  the  noise  bits  in  parity  check  sk+1  =  s^. 
Thus  s4  checks  e 2,  e3,  and  e4-  Similarly,  s5  checks  e^,  e3,  and  e^;  and  s^  checks  e1, 
e2,  and  e^.  We  observe  that  s,.  and  s^  are  a  set  of  two  parity  checks  orthogonal  on  ej. 
Similarly,  s4  and  s^  are  orthogonal  on  e.,;  and  s 4  and  s5  are  orthogonal  on  e3-  From 
Theorem  1,  it  now  follows  that  e^,  e2,  and  e3  can  all  be  correctly  determined  by  major¬ 
ity  decoding,  provided  that  there  is  no  more  than  a  single  error  in  the  six  received  bits 
in  a  block.  The  transmitted  information  bits  can  then  be  found  by  adding  e^,  e^,  and  e3 
modulo-two  to  the  received  information  bits. 

Before  making  a  systematic  investigation  of  orthogonal  parity  checks  for  block  codes, 
we  shall  first  generalize  some  of  the  definitions  given  in  Section  III  to  the  case  of  block 
codes. 

The  minimum  distance,  d,  of  a  block  (n,k)  code  is  customarily  defined  as  the 
smallest  number  of  positions  in  which  two  code  words  differ.  Since  the  set  of  code 
words  form  a  group,  d  is  also  the  weight  of  the  nonzero  code  word  with  the  fewest  non¬ 
zero  positions. 

We  say  that  a  block  (n,k)  code  can  be  completely  orthogonalized  if  d-1  parity  checks 


orthogonal  on  each  can  be  formed,  for  j 


1,  2, _ k,  that  is,  on  each  of  the  informa- 

j 

tion  noise  digits. 

Only  the  information  noise  digits  need  to  be  found  by  the  decoding  process,  since  the 
entire  set  of  transmitted  information  digits  can  be  found  from  them  by  using  the  relation 
t.  =m  -  e^,  j  =  1,  2,  .. ...  k.  (The  transmitted  parity  digits  can,  of  course,  be  recon¬ 
structed  from  the  information  digits  by  using  Eq.  175  if  these  parity  digits  are  required 
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at  the  receiver.)  From  Theorem  I,  it  follows  that  each  of  the  k  information  noise  digits 
will  be  correctly  decoded,  provided  that  there  are  no  more  than  or  fewer  errors 

in  a  received  block,  if  majority  decoding  is  used  with  a  code  that  can  be  completely 
orthogonalized.  Thus  the  entire  set  of  information  hoise  digits  will  be  correctly  decoded 
in  this  case.  In  other  words,  any  error  pattern  that  is  guaranteed  correctable  by  the 
minimum  distance  of  the  code  is  correctable  by  majority  decoding  when  the  code  can  be 
completely  orthogonalized.  Many  error  patterns  of  greater  weight  may  also  be  cor¬ 
rected,  as  will  be  shown  by  a  specific  example  in  section  6.2b. 

Given  a  set  {A.}  of  J  parity  checks  orthogonal  on  some  e^,  we  say  that  the  size,  m, 
of  each  such  parity  check  is  the  number  of  noise  digits,  exclusive  of  e^,  checked  by  that 
parity  check. 


6.2  MAXIMAL-LENGTH  CODES 

In  section  3.7,  it  was  shown  that  the  class  of  uniform  binary  convolutional  codes  could 
be  completely  orthogonalized.  These  codes  had  the  property  that  their  minimum  distance 
coincided  with  their  average  distance.  It  seems  natural  then  to  begin  our  investigation  of 
block  codes  with  that  class  of  codes  for  which  the  minimum  distance  between  code  words 
coincides  with  the  average  distance,  namely  the  maximal-length  codes.  These  codes 
derive  their  name  from  the  fact  that  the  code  word  can  be  considered  as  the  first  q  -1 

output  symbols  of  a  maximal-length  k-stage  shift  register  having  the  k  information  sym- 

33 

bols  as  initial  conditions,  where  all  the  symbols  are  elements  of  GF(q). 

One  may  consider  a  block  (n,k)  code  to  be  the  special  case  of  a  convolutional  code  for 
which  n^  =  nQ  =  n  and  kQ  =  k.  It  then  follows  from  Eq.  53  that  the  average  distance 
between  code  words  in  a  block  (n,k)  code  is 


d 


avg 


k-1 

q 

-r - -  (q-1)  n. 

q  -  1 


(180) 


For  the  maximal-length  codes,  since  n  =  q^ 

d  =  qk-1(q-i). 


1  and  d  =  d  , 
avg’ 


we  have,  from  Eq.  180, 


The  parity-check  matrix,  [P:-l],  for  the  maximal-length  codes  is  such  that  P  has 
as  rows  the  set  of  all  q  -k-1  nonzero  k-tuples,  excluding  the  k  that  are  all  zero 
except  for  a  single  "one"  in  some  position.  This  fact  can  be  seen  in  the  following  way: 
For  any  nonzero  initial  conditions,  the  first  q-1  output  digits  of  a  maximal-length 
shift  register  contain  each  nonzero  k-tuple  in  some  set  of  k  successive  positions  (with 
cycling  from  the  last  digit  to  the  first  allowed).  Each  shift  of  such  an  output  is  also  a 
possible  output  and  hence  also  a  code  word  in  a  maximal-length  code.  The  set  of  code 
words  can  then  be  considered  the  row  space  of  the  matrix  [Gjj  'where  the  first  row  of 
G  is  any  nonzero  output  sequence,  and  the  remaining  k-1  rows  are  each  the  cyclic  shift 
of  the  preceding  row.  Every  set  of  k  successive  positions  in  the  first  row  then  is  the 

Ir 

same  as  some  column  of  G,  hence  the  q  -1  columns  of  G  must  be  the  set  of  all  nonzero 


88 


k- tuples.  This  is  still  true  after  elementary  row  operations  are  performed  on  G  to  put 
G  in  the  form  [l:P  ]  (this  can  be  seen  by  noting  that  the  set  of  k-tuples,  formed  from 
the  set  of  all  nonzero  k-tuples  by  leaving  all  positions  unchanged,  except  the  first  which 
is  the  sum  of  the  first  and  second  positions  in  the  previous  set,  is  again  the  set  of  all 

k-tuples).  Thus  P  must  have  as  columns  the  set  of  all  nonzero  k-tuples,  excluding 

.. 

the  k  unit  vectors.  But  if  the  code  is  the  row  space  of  [I:P  J,  then  the  parity-check 

r  *  t  i  34  *  t 

matrix  is  |P  :-IJ,  where  t  indicates  the  transposed  matrix.  Thus  P  must  have 

as  rows  the  set  of  all  nonzero  k-tuples  excluding  the  unit  vectors,  and  this  is  the  prop¬ 
erty  that  was  to  be  shown. 

For  example,  the  binary  (7,3)  maximal-length  code  has  the  parity-check  matrix 


1 

1 

0 

0 

1 

1 

H  = 

1 

1 

1 

1 

1 

0 

1 

and  this  is  of  the  form  H  =  [P:l],  where  the  rows  of  P  are  all  the  nonzero  3-tuples 
excluding  the  unit  vectors. 

a.  Number  of  Orthogonal  Parity  Checks 

We  shall  now  determine  the  maximum  number  of  parity  checks  orthogonal  on  e .  that 

Jr  x 

can  be  formed  from  the  parity-check  matrix  of  a  maximal -length  code.  Let  the  q  -  k  -  1 
rows  of  [P]  be  decomposed  into  two  sets  of  rows,  Sj  and  S.,,  such  that 

(1)  Sj  contains  all  q-2  rows  of  weight  one  with  the  first  position  nonzero,  and  all  k-1 
rows  of  weight  two  with  a  "one"  in  the  first  position;  and  otherwise  all  zero  except  for 
a  single  position  that  contains  a  "minus  one." 

(ii)  S2  contains  the  remaining  -  q  -  2k  +  2  rows  of  [P], 

The  set  of  rows  in  Sj  correspond  to  a  set  of  parity  checks  orthogonal  on  e^  because, 
except  for  e,,  no  information  noise  digit  is  checked  by  more  than  one  of  the  parity  checks. 
Moreover,  for  each  row  in  S.,  that  has  first  digit  (3,  there  is  a  unique  row  in  S2  that  has 
first  digit  ( 1-p)  and  for  which  the  digits  in  the  remaining  positions  are  the  negative  of 
those  in  the  former  row.  The  sum  of  these  rows  then  corresponds  to  a  parity  check  that 
checks  e  j  and  no  other  information  noise  digit.  All  of  the  parity  checks  formed  in  this 
way  can  be  joined  to  those  corresponding  to  the  rows  in  Sj  and  the  entire  set  is  orthogd- 
nal  on  ej.  The  number,  J,  of  parity  checks  orthogonal  on  e^  that  are  formed  in  this 
way  is 

J  =  HSj)  +i#(S2),  (182) 

where  #(S)  is  the  number  of  elements  in  the  set  S.  Equation  182  gives 
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(183) 


J  =|q(qk_1  +  1)  “  2- 

It  can  be  seen  that  this  is  the  maximum  number  of  parity  checks  orthogonal  on  e  j 
which  can  be  formed  because 

(i)  no  row  of  [P]  can  be  used  more  than  once  in  forming  parity  checks  orthogonal 
on  ej, 

(ii)  the  rows  in  Sj  are  the  largest  number  of  single  rows  of  [P]  that  can  be  used  in 
a  set  of  parity  checks  orthogonal  on  e^, 

(iii)  the  remaining  rows  of  [P]  must  be  combined  at  least  in  pairs  to  produce  addi¬ 
tional  parity  checks  orthogonal  on  e  ^ . 

From  the  symmetry  of  P,  it  follows  that  the  entire  process  can  be  iterated  for 
j  =  2,  3,  ...  k  in  order  to  obtain  the  same  number  of  parity  checks  orthogonal  on  each  e^. 

b.  Complete  Orthogonalization 

Since  the  maximum  number,  J,  of  parity  checks  orthogonal  on  any  information  sym¬ 
bol  in  a  maximal-length  code  is  given  by  (183),  it  follows  that  the  code  can  be  completely 

k-1 

orthogonalized  if  and  only  if  J  =  d  -  1,  where  d  =  (q-1)  q  .  Using  (183),  the  difference 
between  d-1  and  J  is  found  to  be 

(d-1)  -  J  =|(q-2)(qk"1-l).  (184) 

From  Eq.  184  we  are  able  to  make  the  conclusions  stated  below. 

THEOREM  18:  The  binary  maximal-length  codes  can  be  completely  orthogonalized. 
PROOF  18:  Substitution  of  q  =  2  in  Eq.  184  gives  d  -  1  =  J. 

THEOREM  19:  The  nonbinary  maximal-length  codes  can  be  completely  orthogonal¬ 
ized  when,  and  only  when,  k  =  1,  that  is,  when  there  is  a  single  information  symbol. 

PROOF  19:  For  q  £  2,  the  right-hand  side  of  (184)  will  be  a  positive  number  except 
when  the  last  factor  vanishes.  This  factor  vanishes  if  and  only  if  k  =  1.  Thus  only  in 
this  case  does  d  -  1  =  J. 

Theorem  18  establishes  the  important  result  that  the  class  of  binary  maximal-length 
codes  can  be  completely  orthogonalized.  These  are  (2  -l,k)  codes  with  minimum  dis¬ 
tance  2k  *.  For  large  k,  the  rate,  k/n  =  k/(2k-l),  is  very  small. 

The  fact  that  threshold  decoding  is  not  limited  to  correcting  only  the  error  patterns 
of  weight  [— 2~J,  or  less,  is  especially  important  for  such  low-rate  codes.  As  a  specific 
example,  consider  the  (1023,10)  maximal-length  code  with  d  =  512.  Suppose  that  this 
code  is  used  on  a  binary  symmetric  channel  with  transition  probability  pQ  =  0.25.  The 

average  number  of  errors  in  a  received  block  is  then  (1023)p  ,  or  approximately  256. 

I d- X 1  ® 

On  the  other  hand,  —5—  =  255.  Thus,  the  probability  of  error  would  be  approximately 
l/2  for  a  decoding  algorithm  that  was  capable  of  correcting  only  errors  of  weight  [-=-], 
or  less.  Suppose  now  that  majority  decoding  is  used  to  determine  each  of  the  10  infor¬ 
mation  noise  bits  from  the  set  ofd  -  1  =  511  parity  checks  orthogonal  on  each  such  bit. 
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—7 

The  total  probability  of  error  is  approximately  5X10  .  The  reason  for  this  drastic 

reduction  can  be  seen.  Suppose  that  e^  =  0,  then  e ^  will  be  incorrectly  decoded  only 

when  more  than  256  of  the  511  parity  checks  orthogonal  on  e^  are  "ones.”  Since  each 

of  these  parity  checks  includes  two  noise  bits  exclusive  of  e  y  the  probability  that  such 

a  check  is  "one"  can  be  found  from  Lemma  3  to  be  0.375.  The  probability  that  more 

than  256  of  these  511  parity  checks  are  "ones"  can  then  be  calculated  to  be  less  than 
—8 

3.1  X  10  .  In  a  similar  manner,  the  probability  that  e,  is  incorrectly  decoded  when 

—7  1 

ej  =  1  is  found  to  be  less  than  1.0  X  10  .  The  average  error  probability  in  decoding 

ej  is  thus  less  than  (.750)(3.1X10  +  (,250)(10  =  4.9  X  10  The  probability  of  any 

decoding  error  among  all  10  information  noise  bits  is  certainly  less  than  10  times  the 

_7 

probability  of  incorrectly  decoding  eJ(  or  less  than  4.9  X  10 

Theorem  19  emphasizes  the  remark  made  in  section  1.2  to  the  effect  that  there  are 

difficulties  in  applying  threshold  decoding  to  nonbinary  codes  in  an  efficient  manner. 

For  large  k,  it  follows  from  ( 183)  that  J  a  4-  n,  independent  of  q,  whereas  from  Eq.  180 
q-  i 

we  have  d  a - n.  This  indicates  the  fundamental  difficulty  that  we  have  encountered 

in  trying  to  orthogonalize  nonbinary  codes,  namely  that  the  number  of  orthogonal  parity 
checks  that  can  be  formed  is  about  the  same  as  can  be  formed  for  a  binary  code  with 
the  same  n  and  k.  This  means  that  full  advantage  is  not  being  taken  of  the  higher  order 
alphabet  of  the  nonbinary  codes.  On  the  other  hand,  although  complete  orthogonalization 
is  not  possible  for  the  nonbinary  maximal-length  codes,  the  simplicity  of  the  threshold¬ 
decoding  algorithms  might  make  them  reasonable  choices  for  decoding  these  codes  and, 
perhaps,  other  nonbinary  codes  also;  but  we  shall  not  pursue  the  matter  further.  Here¬ 
after,  we  shall  consider  only  binary  codes. 

6.3  THRESHOLD-DECODING  CIRCUITS  FOR  CYCLIC  CODES 

The  maximal-length  codes  are  cyclic  codes,  that  is,  a  code  for  which  a  cyclic  shift 

(L— with  t]**tn)  of  any  cyclic  code  is  again  a  code  word.  Peterson  has  shown  that 

any  (n,k)  cyclic  code  can  be  encoded  by  a  linear  sequential  network  containing  either  k 

25 

or  n-k  stages  of  shift  register.  The  cyclic  structure  of  these  codes  also  makes  pos¬ 
sible  simple  threshold  decoding  circuits  as  we  shall  now  show. 

A  cyclic  code  can  be  completely  orthogonalized  if,  and  only  if,  d-1  parity  checks 
orthogonal  on  e^  can  be  formed.  This  follows  from  the  fact  that  the  parity  checks  on 
e2  must  be  able  to  be  put  into  the  same  form  as  the  parity  checks  on  e^,  but  with  all 
indices  increased  by  one  cyclically.  If  J  parity  checks  orthogonal  on  e  j  can  be  formed, 
then  J  parity  checks  orthogonal  on  e2  can  be  formed,  and  conversely.  A  similar  argu¬ 
ment  applies  for  e^,  e^,  . . .  e^.  This  feature  of  cyclic  codes  makes  possible  the  use  of 
threshold  decoders  very  similar  to  the  Types  I  and  II  decoders  for  convolutional  codes. 
We  call  these  circuits  the  cyclic  Types  I  and  II  decoders  and  we  shall  illustrate  their 
construction  by  using,  as  an  example,  the  (7,3)  maximal-length  code  having  the  parity 
check  matrix  of  Eq.  181. 

The  cyclic-Type  I  decoder  for  this  code  is  given  in  Fig.  24.  The  received  bits  are 
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Fig.  24.  Cyclic  Type  I  decoder  for  (7,3)  maximal-length  code. 


fed  simultaneously  into  n  stages  of  buffer  storage  and  into  an  (n-k)-stage  encoding  cir¬ 
cuit.  After  n  shifts,  the  (n-k)-stage  register  contains  the  modulo-two  sum  of  the 
received  parity  bits  and  the  encoded  received  information  bits.  By  Theorem  9,  this  sum 
is  just  the  set  of  parity  checks.  The  parity  checks  are  then  combined  to  produce  the  set 

of  parity  checks  orthogonal  on~e.,  after  which  the  set  of  orthogonal  parity  checks  is 

A  »!< 
weighted  and  compared  with  the  threshold.  The  output  of  the  threshold  element  is  e,, 

the  decoded  estimate  of  e,.  This  is  added  to  r,  which  is  just  emerging  from  the  buffer 

jjc  * 

to  give  tj,  the  decoded  estimate  of  tj.  Since  the  code  is  cyclic,  the  circuit  as  shown 

(without  the  dotted  connection)  will  continue  to  operate  correctly  at  successive  time 

instants,  the  output  after  the  next  shift  being  t_,  and  so  forth. 

4  35 

The  cyclic  Type  I  circuit  is  of  the  type  originally  proposed  by  Meggitt  for  decoding 
an  arbitrary  cyclic  code.  Meggitt  specified  the  main  decision  element  in  the  circuit 
only  as  a  combinatorial  element  that  has  a  "one"  output  for  all  parity-check  patterns 
in  the  (n-k) -stage  register  corresponding  to  error  patterns  with  a  "one"  in  the  first  posi¬ 
tion.  In  general,  there  seems  to  be  no  way  to  construct  a  simple  combinatorial  element 
for  this  circuit,  the  difficulty  may  be  seen  as  follows.  There  is  a  single  error  pattern 
of  weight  one  with  a  "one"  in  first  position,  but  there  are  n-1  of  weight  two,  (n-l)(n-2) 
of  weight  three,  etc.,  with  "one"  in  the  first  position.  Each  such  error  pattern  must 
give  a  distinct  parity-check  pattern  if  it  is  correctable.  Thus,  if  the  decoder  is  to  be  able 

to  correct  any  combination  of  T  or  fewer  errors,  the  combinatorial  element  must  be 

T-l 

able  to  recognize  approximately  n  parity-check  patterns.  It  is  not  practical  to 
attempt  a  synthesis  of  such  a  combinatorial  element,  by  the  standard  minimization  tech¬ 
niques  used  in  logical  design,  when  T  is  greater  than  approximately  2.  The  only  hope 
is  to  find  some  specific  structural  properties  of  the  code  that  will  suggest  a  practical 
form  for  the  combinatorial  element. 

When  the  cyclic  code  can  be  completely  orthogonalized,  then  threshold  decoding  sug¬ 
gests  the  form  of  the  combinatorial  element,  namely  a  threshold  logical  element  with 
the  necessary  modulo-two  adders  needed  to  form  the  orthogonal  parity  checks.  In  such 
a  case,  Meggitt's  theoretically  general  circuit  becomes  a  practical  decoding  circuit. 

Meggitt  included  the  connections  shown  dotted  in  Fig.  24  with  his  general  cyclic 
decoder.  This  causes  the  contents  of  the  (n-k)-stage  shift  register  to  contain  the  parity 
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checks  corresponding  to  the  received  symbols  as  altered  by  the  decoding  process.  After 
decoding  is  complete,  this  register  will  contain  only  "zeros"  if  the  output  is  a  valid  code 
word.  The  presence  of  any  "ones"  indicates  that  an  uncorrectable  error  has  been 
detected.  Finally,  it  should  be  clear  from  Fig.  24  that  the  buffer  storage  can  be  reduced 
to  k  stages  when,  as  is  usual,  the  decoded  parity  symbols  are  not  of  interest.  When 
this  is  done,  the  cyclic  Type  I  decoder  contains  a  total  of  n  stages  of  shift  register. 

The  cyclic  Type  II  decoder  for  the  same  (7,3)  maximal-length  code  is  shown  in 
Fig.  25.  This  circuit  uses  the  threshold-decoding  algorithms  in  the  form  specified  in 
section  4.1b.  In  this  case  the  set  {bJ-  of  equations  used  in  the  decoding  decision  are 
given  by 

B0=r1  (185) 

and 

B.  =  (sum  of  the  received  bits,  exclusive  of  r.,  whose  noise 

^  th  * 

bits  are  checked  by  the  i  parity  check  orthogonal  on  e^).  (186) 

The  manner  in  which  these  equations  are  weighted  and  compared  with  the  threshold  is 
exactly  the  same  as  described  in  section  4.  lb. 

The  cyclic  Type  II  decoder  is  the  essence  of  simplicity.  The  received  bits  are  first 
stored  in  an  n-stage  shift  register.  The  {B.}  are  formed  by  adding  the  appropriate 

^  ijt 

received  bits..  The  output  of  the  threshold  element  is  i.,  the  decoded  estimate  of  i.. 
Since  the  code  is  cyclic,  the  same  circuit  gives  i2  as  its  output  after  the  shift  register 
is  cycled  once,  and  so  on. 

The  manner  in  which  the  weighting  factors  and  the  threshold  are  calculated  is  exactly 
the  same  for  the  cyclic  Types  I  and  II  decoders  as  for  the  convolutional  Types  I  and  II 
decoders,  and  no  further  explanation  will  be  given  here.  The  set  of  weights  and  the 
threshold  are  constants  for  the  binary  symmetric  channel  when  APP  decoding  is  used, 
and  are  always  constants  for  majority  decoding. 

The  main  features  of  the  cyclic  Types  I  and  II  decoders  can  be  summarized  in  the 
following  theorem. 


r?.  .  .r2,  r, 


1 - OUTPUT 

MAJORITY  ELEMENT  OUTPUT: 

"ONE"  IF  3  OR  4  INPUTS  ARE  "ONE"; 

"ERROR  ALARM"  IF  2  ARE  "ONE"; 

"ZERO"  IF  1  OR  0  ARE  "ONE." 

Fig.  25.  Cyclic  Type  II  decoder  for  (7,3)  maximal-length  code. 
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THEOREM  20:  Given  a  cyclic  (n,k)  binary  code  that  can  be  completely  orthogonal- 
ized,  any  combination  of  errors,  or  fewer,  in  a  block  can  be  corrected  by  a 

decoding  circuit  containing  n  stages  of  shift  register  and  one  threshold  logical  element. 

In  Theorem  20,  we  assume  that  majority  decoding  is  used  on  the  set  of  orthogonal 
parity  checks  that  can  be  formed  on  e^.  We  have  underlined  part  of  the  statement  of 
this  theorem  to  emphasize  the  fact  that  the  code  must  have  a  structure  that  permits  the 
formation  of  d-1  parity  checks  on  e  j  if  the  cyclic  Types  I  and  II  decoders  are  to  be  effi¬ 
cient  and  practical  decoding  circuits.  There  is  no  reason  to  suspect  that  such  a  struc¬ 
ture  is  a  general  property  of  cyclic  codes. 

When  APP  decoding  is  used  to  determine  e  j  from  the  set  of  parity  checks  orthogonal 
on  e^,  it  becomes  necessary  to  use  weighting  factors  on  each  of  the  input  lines  to  the 
threshold  element  of  the  cyclic  Types  I  and  II  decoders.  For  the  binary  symmetric  chan 
nel,  these  weighting  factors  are  constants  as  explained  in  section  4. 1.  For  a  time- 
variant  binary  output  channel,  the  weighting  factors  and  the  threshold  can  be  computed 
by  an  analog  circuit  similar  to  that  of  Fig.  14.  Such  a  circuit  for  the  (7,3)  maximal- 
length  code  is  shown  in  Fig.  26.  The  inputs  to  this  circuit  are  Cj,  c.,,  .  . .  c?  where 
m  =  -loge[l-2  Pr(e^=l)].  The  operation  of  this  circuit  is  so  similar  to  that  of  the  circuit 
in  Fig.  14  that  no  further  explanation  is  required. 


Fig.  26.  Analog  circuit  for  computing  weighting  factors  and  threshold. 


6.4  BOSE -CHAUDHURI  (15,7)  CODE 


Except  for  the  maximal-length  codes,  we  have  been  unable  to  find  classes  of  good 
block  (n,k)  codes  that  can  be  completely  orthogonalized.  However,  in  certain  isolated 
cases  we  have  found  good  codes  that  can  be  completely  orthogonalized.  One  such  code 
is  the  (6,3)  code  of  Eq.  179.  A  more  interesting  case  is  the  (15,7),  d  =  5,  binary  Bose- 
Chaudhuri  code.^  This  code  has  the  parity-check  matrix  H  =  [P:l]  for  which  the  rows  of 
P  are 

3-1  IT]  0  [I]  0  0  0 
0  110  10' 

0  0  110  1 
0  0  0  1  1  0 
3-1  10111 

0  1  1  0  [T|  1  1— | 

3-1  1  1  0  0  1  1— 1 

3-1  o  jl]  o  o  o  [2 

and  d  -  1  =  4  parity  checks  orthogonal  on  e^  can  be  formed  as  shown.  (The  same  con¬ 
vention  for  indicating  the  formation  of  the  orthogonal  parity  checks  is  used  here  as  was 
described  in  sections  3.5  and  3.6.)  Since  this  is  a  cyclic  code,  it  follows  that  the  code 
can  be  completely  orthogonalized  and  can  be  decoded  by  either  the  cyclic  Type  I  or  cyclic 
Type  II  decoders. 

Aside  from  the  fact  that  the  ( 15,7)  Bose-Chaudhuri  code  is  a  useful  code,  this  example 

36 

is  interesting  in  that  this  code  is  one  for  which  Peterson  attempted  a  computer  mini¬ 
mization  of  the  logical  expression,  defining  the  truth  table  for  the  combinatorial  element 
in  a  Meggitt  decoder,  and  was  unable  to  find  a  simple  logical  circuit  from  this  approach. 
The  orthogonal  parity-check  approach  leads  naturally  to  a  simple  logical  circuit  for  the 
combinatorial  element  in  the  Meggitt  decoder  for  this  code,  namely  a  majority  logic 
element  and  two  modulo-two  adders. 

6.5  A  SUFFICIENT  CONDITION  FOR  COMPLETE  ORTHOGONALIZATION 

We  have  not  been  able  to  formulate  a  good  set  of  conditions  sufficient  to  establish 
whether  or  not  a  binary  (n,k)  code  can  be  completely  orthogonalized.  Our  only  result 
in  this  direction  is  the  following  theorem  whose  proof  is  given  in  Appendix  D. 

THEOREM  21:  Any  binary  block  (n,k)  code  with  k  =  3  can  be  completely  orthog¬ 
onalized. 

Theorem  21  establishes  the  maximum  value  of  k  for  which  a  block  (n,k)  code  can 
always  be  completely  orthogonalized.  The  simplest  nontrivial  k  =  4  code  is  the  Hamming 
(7,4)  d  =  3  code  for  which  the  parity-check  matrix  is 

'l  0  1  1 

H  =  1  1  1  0 

0  111 
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and  it  is  readily  verified  that  there  is  no  way  to  form  d-1  parity  checks  orthogonal  on 
ej.  Since  this  is  a  cyclic  code,  it  follows  also  that  there  is  no  way  to  form  d-1  parity 
checks  orthogonal  on  any  of  the  information  noise  bits. 

6.6  SUMMARY 

We  have  seen  how  threshold  decoding  can  be  applied  to  block  (n.k)-codes.  For  cyclic 
codes,  we  have  shown  that  the  threshold-decoding  circuits  can  be  made  quite  simple. 
However,  for  the  most  part,  it  appears  that  efficient  use  of  threshold  decoding  for  block 
(n,k)  codes  is  confined  to  binary  low-rate  codes  such  as  the  maximal -length  codes.  This 
consideration  is  reinforced  by  Theorem  2 1  which  stated  that  complete  orthogonalization 
of  a  binary  code  is  always  possible  only  when  the  number  of  information  symbols  is  3 
or  less.  In  order  to  by-pass  this  limitation  on  the  class  of  codes  to  which  threshold 
decoding  can  be  applied  in  an  efficient  manner,  we  shall,  in  Section  VII,  generalize  the 
manner  of  application  of  the  threshold-decoding  algorithms  in  such  a  way  that  they  can 
be  applied  to  an  enlarged  class  of  block  codes  in  an  efficient  manner. 


VII.  GENERALIZED  THRESHOLD  DECODING  FOR  BLOCK  CODES 


In  Section  VI  we  were  able  to  establish  that  the  threshold-decoding  algorithms  could 
be  applied  efficiently  to  the  binary  maximal-length  codes,  a  class  of  low-rate  codes. 
The  effectiveness  of  the  basic  algorithms  appears  to  be  limited  to  such  low-rate  codes, 
in  the  sense  that  complete  orthogonalization  of  a  code  is  usually  not  possible  for  high- 
rate  codes.  We  now  extend  the  procedure  for  forming  orthogonal  parity  checks  to 
enlarge  the  class  of  codes  for  which  threshold  decoding  can  be  efficient.  We  limit  our 
treatment  to  binary  codes.  (It  is  not  yet  clear  to  what  extent  the  generalized  procedure 
is  applicable  to  nonbinary  codes.) 

Before  describing  the  generalized  procedure,  we  shall  illustrate  its  use  by  an 
example.  Consider  the  Hamming  (7,4),  d=3,  code  for  which  the  parity-check  matrix, 
H  =  [P:l]  is  given  in  Eq.  188.  We  have  seen  that  this  code  could  not  be  completely 
orthogonalized.  For  this  code,  the  rows  of  P  correspond  to  parity  checks  Sg,  Sg,  and 
s^,  and  are 

10  11 
2  —  [T]  1  1  0 
2-0  1  1  0Q . 

(As  in  Section  III,  we  use  a  numbered  arrow  to  indicate  an  orthogonal  parity  check  and 

its  size,  and  we  place  a  box  about  each  nonzero  coefficient  of  an  information  noise  bit, 

other  than  those  in  the  sum,  appearing  in  the  parity  checks  orthogonal  on  that  sum.) 

As  shown,  Sg  and  s ^  form  a  set  of  d  -  1  =  2  parity  checks  orthogonal  on  the  sum,  e 2  +  eg. 

Provided  that  no  more  than  a  single  error  is  in  the  received  block,  majority  decoding 

»}« 

can  be  used  to  determine  this  sum  correctly.  Let  us  call  this  decoded  estimate  (e2+e3) 
Similarly,  s5  and  s^  form  d  - 1  =  2  parity  checks  orthogonal  on  eg  +  e^,  and  we  can  find 
(e3+e4)  by  majority  decoding. 

Now  consider  modifying  the  original  parity  checks  to  form  a  set  s^  and  sj,  given  by 
s5  =  S5  +  (e3+e4}* 

and  (189) 

s6  =  S6  +  (e2+e3)  ' 

From  Eq.  189,  it  can  be  seen  that  if  the  sums  were  correctly  decoded,  then  and  sj, 
are  parity  checks  corresponding  to  the  parity-check  matrix  H'  =  [P':I]  where  the  rows 
of  P'  are 

1  0  0  0  (190) 

10  0  0 

and  d-1  =  2  parity  checks  orthogonal  onej  can  now  be  formed.  The  entire  decoding 
process  will  be  correct  provided  that  no  more  than  a  single  error  is  present  in  the 
received  block.  Since  the  code  is  cyclic,  e2,  eg,  and  e4  can  all  be  determined  in  a 
similar  manner. 
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7.1  L-STEP  ORTHOGONALIZATION 


The  generalized  orthogonalization  procedure  can  now  be  described.  Starting  with 
the  original  set  of  parity  checks  corresponding  to  the  parity-check  matrix  H,  suppose 
that  we  can  form  sets  of  at  least  d-1  parity  checks  orthogonal  on  selected  sums  of 
information  noise  bits.  (If  the  code  cannot  be  L-step  orthogonalized,  one  might  wish  to 
use  the  decoding  algorithm  by  forming  as  many  orthogonal  parity  checks  at  each  stage 
as  possible.)  These  sums  are  then  assumed  to  be  known  (threshold  decoding  is  used  to 
estimate  each  of  these  sums)  and  are  treated  as  additional  parity  checks,  that  is,  as 
known  sums  of  noise  bits.  These  additional  parity  checks  are  then  combined  with  the 
original  parity  checks  to  produce  a  set  of  parity  checks  corresponding  to  a  parity-check 
matrix  H'.  Provided  that  all  the  sums  were  correctly  decoded,  H'  will  be  a  true  parity- 

check  matrix.  H'  is  then  transformed  to  H"  by  a  similar  process.  Ultimately,  some 
(L) 

H'  is  produced  from  which  d-1  parity  checks  orthogonal  on  e^  can  be  produced.  If 
this  procedure  can  be  carried  out  for  all  e^,  j  =  1,  2,  ...  k,  then  we  say  that  the  code 
can  be  L-step  orthogonalized. 

According  to  this  definition,  one- step  orthogonalization  is  the  same  as  complete 
orthogonalization  as  defined  in  section  6.  1.  We  see  that  the  Hamming  (7,  4)  code  can  be 
2-step  orthogonalized. 

If  majority  decoding  is  used  as  the  decision  rule  at  every  stage  of  the  orthogonal- 
ization  procedure,  then  any  combination  of  |_~2~j>  or  fewer,  errors  in  the  received 
block  will  be  corrected  when  the  code  can  be  L-step  orthogonalized.  This  follows  from 
the  fact  that  the  set  of  sums  at  the  first  stage  will  then  be  correctly  decoded  and  H1  will 
be  a  true  parity-check  matrix.  By  the  same  reasoning,  H",  H1",  .  . .  H'  '  will  be  true 
parity-check  matrices,  and  hence  the  majority  decision  on  e.  will  be  correct  for 
j  =  1,  2,  ...  k. 

It  is  plausible  that  better  results  could  be  obtained  by  using  APP  decoding  at  each 
stage.  This  must  be  true  at  the  first  stage  because  the  APP  decoding  rule  must  give  at 
least  as  small  an  error  probability  as  majority  decoding  in  estimating  the  sums  from 
the  sets  of  orthogonal  parity  checks.  However,  after  these  sums  are  combined  with  the 
original  parity  checks  to  form  parity  checks  corresponding,  to  the  modified  matrix  H, 
the  set  of  orthogonal  parity  checks  formed  from  H1  no  longer  have  the  strict  statistical 
independence  which  they  would  have  if  H1  were  the  original  parity-check  matrix.  It 
seems  reasonable,  however,  to  treat  these  orthogonal  parity  checks  as  though  they 
enjoyed  the  proper  independence  needed  for  application  of  the  APP  decoding  rule  because 
if  H’  were  incorrectly  formed  and  hence  were  not  a  true  parity  check  matrix,  then  the 
decoding  process  would  probably  fail  anyway. 

7.2  FIRST-ORDER  REED -MULLER  CODES 

As  a  first  example  to  illustrate  how  the  generalized  orthogonalization  procedure  can 
be  applied,  we  prove  the  following  theorem. 

THEOREM  22:  For  any  M,  there  exists  a  binary  (2^,  M+l)  block  code  with  d  =  2^  ^ 
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that  can  be  2 -step  ortho gonalized. 

PROOF  22:  Consider  the  parity-check  matrix  [P:lj,  where  the  rows  of  P  contain  all 
(M+l)-tuples  of  odd  weight  three  and  greater.  There  are  exactly  2^  -  M  -  1  such  rows 
and  hence  n  -  k  =  2^  -  M  -  1.  This  can  be  seen  from  the  fact  that  half  of  the  set  of 
all  2^+~  (M+l)-tuples  have  odd  weight,  and  P  has  as  rows  all  these  (M.+l)-tuples 
except  the  M+l  unit  vectors.  Since  k  =  M  +  1,  we  have  n  =  2^  as  stated  in  the  theorem. 
We  now  show  that  such  a  code  has  minimum' distance  d  =  2^  *  and  can  be  two-step 
orthogonalized . 

Consider  the  number  of  parity  checks  orthogonal  on  e^  +  e2  which  can  be  formed. 
For  each  row  of  P  beginning  with  "l  0",  there  is  a  unique  row  beginning  with  "0  1" 
that  is  otherwise  the  same.  The  sum  of  these  rows  gives  a  parity  check  on  e^  +  e2  and 
no  other  information  noise  bits.  Also,  for  each  row  of  P  beginning  with  "l  l"  and 
having  weight  5  or  more,  there  is  a  unique  row  beginning  with  "0  0"  that  is  otherwise 
the  same.  The  sum  of  these  rows  again  gives  a  check  on  e^  +  e 2  and  no  other  infor¬ 
mation  noise  bits.  Finally,  the  rows  of  P  beginning  with  f,l  1"  and  having  weight  three 
all  have  the  third  "one"  in  disjoint  positions.  Each  of  these  gives  a  check  on  e^  +  e 2 
and  one  other  distinct  information  noise  bit.  Using  this  procedure,  we  can  form  as 

many  parity  checks  orthogonal  on  e.  +  e„  as  there  are  rows  of  P  with  a"cne"  in  the  first 

M-l  1  ^ 

column,  and  this  number  is  2  -  1. 

From  the  symmetry  of  P,  which  has  as  rows  all  vectors  of  odd  weight  three  and 
M-l 

greater,  it  follows  that  2  - 1  parity  checks  orthogonal  on  any  sum  of  two  information 

noise  bits  can  be  formed.  We  can  then  form  a  modified  parity-check  matrix  [P:I]  by 
assuming  that  e2  +  e^,  e^  +  e^,  .  . . ,  +  e^  are  now  known  and  can  be  used  to  elimi¬ 

nate  variables  from  the  original  parity-check  equations.  We  observe  that  any  sum  of 
an  even  number  of  the  variables  e2,  e^,  .  . . ,  e^  can  be  formed  from  the  assumed  known 
sums  (for  example,  eg  +  e4  =  ^e2+e3^  +  (eg+e^)).  But  since  sdl  the  rows  of  P  have 
odd  weight,  all  of  the  parity  checks  on  ej  check  an  even  number  of  the  variables 

e„,  e„,  .  .  .  e,  and  these  can  all  be  eliminated  by  using  the  assumed  known  sums.  Thus, 

“  J  K  M-l 

using  the  rows  of  P  beginning  with  a  "one,"  we  can  form  a  P1  with  2  -  1  rows  of  the 

M-l 

form  1  0  0  ...  0,  and  hence  2  -  1  parity  checks  orthogonal  on  e^  can  be  formed 

from  the  modified  parity-check  matrix.  Again,  from  the  symmetry  of  P,  it  follows 

that  a  similar  process  can  be  applied  to  e9,  e„,  . . .  e,  . 

M-l  “  ”  K 

It  remains  to  show  that  d  =  J  +  1  =  2  .  The  minimum  distance  must  be  at  least 

this  great.  On  the  other  hand,  d  can  be  at  most  one  greater  than  the  number  of  "ones" 

in  any  column  of  P  (cf.  Lemma  D.  1)  and  there  are  J  "ones"  in  each  column.  Hence  d 

must  be  exactly  J  +  1  which  proves  the  theorem. 

The  codes  described  in  the  proof  of  Theorem  18  are  equivalent  to  the  first-order 
37 

Reed-Muller  codes,  that  is,  they  differ  from  the  latter  codes  only  by  a  permutation 
of  the  code  positions.  However,  the  codes  as  we  have  used  them  are  in  systematic 
form  which  means  that  the  first  k  encoded  symbols  are  identical  to  the  information 
symbols.  This  simplifies  encoding,  and  is  also  usually  desirable  for  other  reasons 
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(for  example,  some  receiving  stations  may  not  have  decoding  circuits  but  may  still  wish 

to  extract  some  data  from  the  received  messages).  The  Reed-Muller  codes  are  not 

13 

usually  given  in  systematic  form  because  the  Reed  decoding  algorithm  does  not  apply 
directly,  that  is,  without  a  prior  linear  transformation. 

Finally,  it  should  be  observed  that  the  threshold-decoding  procedure  can  be  stream- 
lined  in  the  following  way.  After  all  k-1  sums,  (e^+eg)  ,  (e2+e3^  »  •  •  •  ^ek-i+e]^  have 

been  determined  and  e.  has  been  determined  from  the  modified  parity-check  matrix, 

^  *  #  * 

the  remaining  variables  e„,  e„,  ...  e,  can  all  be  found  directly  by  combining  these 

*  d  *  *  *  *  $ 

known  quantities.  For  example,  eg  =  e^  +  (e^+e9)  +  (eg+e^)  .  Thus  a  total  of  only  k 
threshold-decoding  decisions  need  to  be  made  in  the  entire  decoding  process. 

In  general,  when  threshold  decoding  with  L-step  orthogonalization  of  a  code  is 
possible,  it  is  never  necessary  to  make  more  than  k  threshold -decoding  decisions  in 
the  entire  decoding  process.  This  follows  from  the  fact  that  each  decision  gives  the 
decoded  estimate  of  some  sum  of  the  variables  e^,  e^,  . .  .  e^.  Since  there  are  only  k 
of  these  variables,  there  can  be  at  most  k  linearly  independent  sums  formed  from  them. 
If  k  +  m  estimates  of  sums  are  formed  by  threshold  decoding  in  the  decoding  process, 
then  at  least  m  of  these  estimates  could  have  been  found  by  taking  linear  combinations 
of  the  other  estimates.  Conversely,  fewer  than  k  threshold-decoding  decisions  can 
never  suffice  for  the  entire  decoding  process,  since  there  are  k  independent  quantities 
to  be  estimated. 

7.3  HAMMING  CODES 

We  have  shown  that  the  (7,4),  d=3,  Hamming  code  can  be  2-step  orthogonalized. 

This  code  is  one  in  a  class  of  d=3  codes,  having  n  =  2^  -  1  and  k  =  2M'  -  M  -  1,  which 

were  discovered  by  Hamming.11  These  codes  are  cyclic  codes.  They  are  also  "sphere- 

packed"  codes,  which  means  that  all  2n  possible  received  n-tuples  are  distance  -  ^  1  =  i 

or  less  from  some  code  word.  Any  decoding  algorithm  that  corrects  a  single  error  in 

a  block  is  then  a  maximum-likelihood  decoding  algorithm  for  these  codes  on  the  binary 

symmetric  channel  (see  Appendix  B).  Majority  decoding  with  2-step  orthogonalization 

is  thus  a  maximum-likelihood  decoding  algorithm  for  the  (7,4)  code. 

We  shall  now  show  that  all  of  the  Hamming  codes  can  be  L-step  orthogonalized  for 

some  L,  and  hence  that  majority  decoding  with  L-step  orthogonalization  is  a  maximum- 

likelihood  decoding  procedure  for  these  codes  on  a  binary  symmetric  channel.  We  do 

not  suggest  that  threshold  decoding  with  L-step  orthogonalization  is  a  desirable  way  to 

decode  this  class  of  codes,  since  very  simple  decoding  procedures  may  be  employed 

such  as  the  original  procedure  suggested  by  Hamming,11  or  the  procedure  advanced  by 
38 

Huffman.  Our  purpose  is  only  to  show  that  it  is  possible  to  apply  the  generalized 
threshold  decoding  procedure  to  at  least  one  class  of  high-rate  codes.  We  conjecture 
that  the  same  result  will  apply  to  other  classes  of  high-rate  codes. 

THEOREM  23:  For  M  =  2,  3,  4,  ...,  the  Mth-order  Hamming  (2M  -  1,  2M-M-1), 
d  =  3,  codes  can  be  L-step  orthogonalized  for  L  no  greater  than  M  -  1. 
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PROOF  23:  The  Mth-order  Hamming  code  has  the  parity-check  matrix  [P:l],  where 
the  columns  of  P  contain  all  distinct  nonzero  M-tuples  of  weight  two  and  greater.  The 
M=2,  or  (3,  1),  code  can  be  1-step  orthogonalized  according  to  Theorem  2.1.  The  M=3, 
or  (7,4)  code  can  be  2-step  orthogonalized  as  we  have  shown.  We  now  show  by  induction 
that  the  M^-order  code  can  be  (M-l)-step  orthogonalized. 

Suppose,  first,  that  e.,  appears  in  an  odd  number  of  the  original  M  parity  checks, 
that  is,  that  the  first  column  of  P  has  odd  weight.  Then  let  m  be  the  i**1  row  of  P,  and 
y.  be  the  sum  of  the  remaining  M-l  rows  of  P.  y.  and  r.  both  have  "ones"  in  those 
positions,  and  only  those,  corresponding  to  columns  of  P  with  even  weight  and  a  "one" 
in  the  i*'*1  row.  Thus  7^  and  m  form  a  set  of  d-1  =  2  parity  checks  orthogonal  on  the 
sum  of  the  information  noise  bits  in  these  positions.  M  such  pairs  of  orthogonal  parity 
checks  can  be  formed.  The  assumed  known  sums  can  then  be  used  to  eliminate  all 
"ones"  from  columns  of  even  weight  in  P.  The  only  nonzero  columns  are  the  original 
set  of  2^~*  -  (M-l)  -  1  columns  of  P  with  odd  weight.  Then  by  omitting  the  last  row 

and  the  all-zero  columns,  P  is  transformed  into  P1  corresponding  to  the  parity-check 
th 

matrix  of  the  M-l  -order  Hamming  code,  e^  is  still  checked  by  the  modified  parity- 

check  matrix,  since  its  position  corresponded  to  a  column  of  P  with  odd  weight. 

Conversely,  suppose  that  e^  appears  in  an  even  number  of  the  original  M  parity 

checks,  that  is,  the  first  column  of  P  has  even  weight.  Consider  now  the  set  of  M 

assumed-known  sums  described  in  the  preceding  paragraph.  These  sums  contain  a 
M-l 

total  of  2  -  1  distinct  noise  bits,  namely  those  information  noise  bits  in  positions 

corresponding  to  columns  of  P  with  even  weight.  These  sums,  by  themselves,  corre¬ 
spond  to  a  modified  parity-check  matrix  made  up  of  all  of  the  columns  of  even  weight  in 
P.  Omit  any  row  of  this  matrix  (unless  there  are  only  two  modified  parity  checks  on 
e^,  in  which  case  omit  a  row  that  does  not  check  on  e^),  that  is,  discard  one  of  the  sums, 
The  remaining  M-l  sums  then  correspond  to  a  modified  parity-check  matrix  which  is 
that  of  an  M-l  order  Hamming  code.  The  modified  parity  bits  are  the  M-l  information 
noise  bits  that  were  checked  in  the  omitted  row  and  only  one  other  row. 

After  performing  this  process  a  total  of  M-3  times,  we  reach  the  parity  matrix  of 
the  third-order  Hamming  code,  and  e^  is  still  checked.  This  code  can  be  2-step 
orthogonalized.  Thus  d-1  =  2  parity  checks  orthogonal  on  e^  can  be  formed  after  an 
(M-l)-step  orthogonalization  procedure.  (We  have  not  proved  that  this  is  the  minimum 
number  of  steps  required.)  Clearly,  the  same  argument  applies  as  well  to  e3*  •  •  ■  ek 
and  this  proves  the  theorem. 

7.4  BOSE -CHAUDHURI  CODES 

The  Hamming  codes  can  be  considered  as  a  special  case  of  the  more  general  class 

1  R 

of  cyclic  codes  known  as  the  Bose-Chaudhuri  codes.  (These  codes  were  discovered 
independently  by  Hocquenghem.  )  Although  we  have  been  unable  to  prove  that  this 
entire  class  of  codes  can  be  L-step  orthogonalized,  we  have  verified  the  fact  that  the 
codes  of  length  15  or  less  can  be  so  orthogonalized.  There  are  four  such  codes.  The 
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(7,  4)  and  (15,  11)  codes  are  Hamming  codes  and  can  be  2-step  and  3-step  orthogonalized, 
respectively,  according  to  Theorem  23.  The  (15,  7)  code  can  be  1-step  orthogonalized 
as  shown  in  section  6.3.  The  remaining  code,  the  (15,  5),  d=7,  code  can  be  2-step 
orthogonalized,  as  we  now  show. 

The  (15,5),  d=7,  Bose-Chaudhuri  code  has  the  parity-check  matrix  H  =  [P:l],  where 
the  rows  of  P  are 
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corresponding  to  the  information  noise  bits  that  are  checked  by  parity  checks  Sg  through 
s^g,  respectively.  For  this  code,  d-1  =  6  parity  checks  orthogonal  on  e2  +  e^  can  be 
formed  as  shown  in  (191).  Since  the  code  is  cyclic,  d-1  =  6  parity  checks  orthogonal 
on  e^  +  e4  and  e^  +  e5  can  also  be  formed,  and  this  is  easily  verified  directly.  Thus 
we  may  assume  that  these  sums  are  known  and  use  them  to  eliminate  variables  in  the 
original  parity-check  equations.  From  these  sums,  any  sum  of  an  even  number  of  the 
variables  e2,  eg,  e^  and  eg  can  be  formed.  This  permits  all  other  information  bits  to 
be  eliminated  from  the  d-1  =  6  parity  checks  in  Eq.  191  which  check  on  e^.  This 
process  transforms  H  into  an  H1  from  which  six  parity  checks  orthogonal  on  e^  can  be 
formed.  Since  the  code  is  cyclic,  the  same  procedure  can  be  carried  out  for  e2,  eg,  e^, 
and  eg.  Hence,  the  code  can  be  2-step  orthogonalized. 

To  illustrate  how  L-step  orthogonalization  can  be  instrumented,  we  show  in  Fig.  27 
the  combinatorial  element  that  would  be  used  in  the  cyclic  Type  I  decoder  for  the 
Bose-Chaudhuri  (15,5)  code.  The  remainder  of  the  decoding  circuit  is  the  same  as  for  the 
cyclic  Type  I  decoder  described  in  section  6.  3  for  cyclic  codes  that  can  be  one-step 
o  rthogonaliz  ed . 

The  upper  three  majority  elements  in  Fig.  27  are  used  to  form  the  decoded 
estimates  of  e2  +  eg,  eg  +  e^,  and  e^  +  eg.  These  quantities  are  then  treated  as 
additional  parity  checks  and  are  combined  with  the  original  parity  checks  to 
form  a  set  of  d  -  1  =  6  parity  checks  orthogonal  on  e^.  These  latter  checks 
are  then  operated  on  by  the  fourth  majority  element  to  produce  the  decoded 
estimate  of  e^.  It  should  be  observed  that  two  levels  of  majority  logic  are 
■  required  in  this  combinatorial  element.  In  general,  with  L-step  orthogonal¬ 
ization,  L  levels  of  threshold  logic  will  be  required.  However,  as  explained 
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in.  section  7.2,  no  more  than  k  majority  elements  are  ever  required  in  the  combina¬ 
torial  element.  Thus,  a  number  of  majority  elements  growing  exponentially  with 
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Fig.  27.  Combinatorial  element  of  a  Cyclic  Type  I  decoder  for  the 
(15,  5)  Bose-Chaudhuri  code. 


L  is  not  required,  as  would  be  the  case  when  the  L  levels  of  majority  elements 
had  the  structure  of  a  full  tree. 


7.5  NONAPPLICABILITY  TO  CONVOLUTIONAL  CODES 

At  first  glance,  it  might  appear  that  the  L-step  orthogonalization  procedure  could 
be  used  to  enlarge  the  class  of  convolutional  codes  that  can  be  threshold-decoded 
efficiently.  Such  is  not  the  case  as  we  now  show,  at  least  for  the  important  cases  for 
which  the  rate  is  one  over  an  integer,  that  is,  R  =  1/ nQ. 

The  parity-check  matrix,  [H^:l]  for  an  R  =  l/nQ  convolutional  code  (cf.  section  3.  3) 
is  such  that  the  rows  of  are  given  by 
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Here,  the  g's  are  the  coefficients  in  the  (nQ-l)  code-generating  polynomials.  Now 
suppose  that  it  is  possible  to  construct  a  set  of  J  parity  checks  orthogonal  on  some  sum 

of  information  noise  bits,  say  on  e  ^  +  e  ^  +  . .  .  +  e  This  means  that  it  is 

al  a2  aj 

possible  to  find  J  subsets  of  the  rows  of  H  such  that  the  sum  of  the  rows  in  each  subset 

has  a  "one"  in  each  of  the  j  positions  corresponding  to  the  information  noise  bits 

e  e  .  .  ,  e  but  any  other  position  has  a  "one"  in  at  most  one  of  the  J 

Q1  a2  aj 

sums  of  rows.  Assume  that  a,  <  a_  <  .  .  .  <  a>.  Now  consider  discarding  all  of  the  left- 

^  2  1  /  \ 

most  columns  of  up  to  the  column  corresponding  to  eQ  .  The  rows  of  this  modified 

j  (1) 

matrix  could  then  be  combined  to  produce  a  set  of  J  parity  checks  orthogonal  on  e  . 

aj 

But,  because  of  the  symmetry  of  each  parity  triangle,  the  array  of  coefficients  in  this 
modified  matrix  is  exactly  the  same  as  for  the  matrix  formed  from  by  discarding 
the  same  number  of  rows  at  the  bottom  of  each  parity  triangle  as  columns  that  were 
discarded  in  the  previous  construction.  It  must  then  be  possible  to  construct  J  parity 
checks  orthogonal  on  e  ^  directly  from  these  rows  of  H^. 

In  other  words,  it  is  always  possible  to  form  directly  as  many  parity  checks 
orthogonal  on  e  ^  as  can  be  formed  on  any  sum  of  information  noise  bits.  Thus,  if 
L-step  orthogonalization  is  possible,  one-step,  or  complete,  orthogonalization  is  also 
possible.  Hence  there  is  no  advantage  in  the  more  general  orthogonalization  procedure 
for  convolutional  codes  with  rate  R  =  l/nQ. 


7.6  SUMMARY 

By  extending  the  manner  in  which  orthogonalization  of  parity  checks  is  performed, 
we  have  seen  that  it  is  possible  to  apply  threshold  decoding  in  an  efficient  manner  to  at 
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least  one  high-rate  class  of  codes,  as  well  as  to  low-rate  codes.  We  have  called  this 
generalized  procedure  L-step  orthogonalization. 

The  ultimate  capability  of  L-step  orthogonalization  has  not  yet  been  determined, 
but  the  results  obtained  thus  far  have  been  most  encouraging.  We  have  not  yet  investi¬ 
gated  a  block  (n, k)  code  for  which  it  could  be  shown  that  L-step  orthogonalization  is 
impossible.  Unfortunately,  this  may  be  due  to  the  fact  that  the  codes  that  we  have 
studied  were  necessarily  of  short  length,  rather  than  to  the  generality  of  the  method. 

Although  the  circuitry  required  to  implement  threshold  decoding  with  L-step 
orthogonalization  is  more  complex  than  that  required  when  one-step  orthogonalization 
is  possible,  the  circuits  are  still  simple  enough  to  be  of  practical  interest  (especially  in 
the  case  of  cyclic  codes).  The  set  of  n-k  parity  checks  can  always  be  formed  by  a  replica 
of  the  encoding  circuit  (cf.  section  2.  6).  These  parity  checks  can  then  be  considered  as 
the  inputs  to  a  combinatorial  network  having  k  outputs,  namely  e^,  e^,  . .  .  e.  ,  the 
decoded  estimates  of  the  information  noise  bits.  These  quantities  can  then  be  added 
(modulo-two)  to  the  received  information  bits  to  form  the  decoder  output. 

If  the  code  can  be  L-step  orthogonalized,  the  complete  combinatorial  network  need 
contain  no  more  than  k  threshold  elements  and  at  most  (d)(k)  modulo-two  adders,  as 
can  be  seen  in  the  following  way.  From  section  7.2,  we  conclude  that  no  more  than  k 
threshold  elements  are  needed.  Enough  adders  are  needed  to  form  the  set  of  orthogonal 
parity  checks  for  each  threshold  element,  and  to  combine  the  output  of  these  devices  to 
form  the  decoded  estimates  of  the  k  information  noise  bits.  Since  no  more  than  d-1 
inputs  are  required  for  each  of  the  threshold  elements  in  order  to  be  able  to  correct 
any  combination  of  or  fewer  errors,  no  more  than  a  total  of  k(d-l)  adders  will  be 

required  to  form  these  inputs.  Since  the  decoded  estimate  of  each  of  the  k  information 
noise  bits  is  the  sum  of  the  outputs  of  some  set  of  the  threshold  elements,  one  adder 
suffices  for  the  formation  of  each  of  these  quantities.  Thus  a  total  of  (d-l)k  +  k  =  dk 
adders  always  suffices  in  the  complete  combinatorial  network. 
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VIII.  CONCLUSIONS  AND  RECOMMENDATIONS  FOR  FURTHER  RESEARCH 
8.  1  SUMMARY  OF  RESEARCH 

In  Section  II  we  reviewed  the  properties  of  convolutional  codes  and  proved  some 
generalizations  of  known  results  for  this  class  of  codes.  With  this  exception,  all  of  this 
report  has  been  devoted  to  the  investigation  of  a  decoding  procedure  for  linear  codes, 
which  we  have  called  11  threshold  decoding." 

In  Section  I,  two  forms  of  threshold  decoding  were  formulated:  majority  decoding, 
and  APP  decoding.  These  are  both  methods  by  which  some  noise  bit,  say  e  ,  can  be 
estimated  from  a  set  of  parity  checks  orthogonal  on  that  noise  bit.  The  majority  decoding 
rule  is  based  entirely  on  minimum  distance,  that  is,  it  assigns  to  em  the  value  that  it 
has  in  the  noise  pattern  of  minimum  weight  that  satisfies  the  set  of  orthogonal  parity 
checks.  The  APP  decoding  rule,  on  the  other  hand,  is  based  on  a  probability  metric, 
that  is,  it  assigns  to  em  the  value  that  is  most  probable,  given  the  particular  values 
of  the  orthogonal  parity  checks.  Although  these  decoding  rules  are  ordinarily  distinct, 
we  have  seen  (Sec.  5.1c)  that  majority  decoding  can  be  considered  as  the  limiting  case 
of  APP  decoding  when  the  channel  is  estimated  to  have  vanishingly  small  error  proba¬ 
bility. 

In  Sections  III,  IV,  and  V  threshold  decoding  was  applied  to  the  specific  task  of 
decoding  convolutional  codes.  First,  it  was  necessary  to  construct  codes  to  which  the 
threshold  decoding  rules  could  be  efficiently  applied.  Bounds  on  code  quality  were 
formulated  to  guide  this  search.  Actual  codes  were  then  constructed  both  by  trial-and- 
error  and  analytical  techniques.  This  research  is  reported  in  Section  III.  In  Section  IV, 
simple  decoding  circuits  were  developed  for  the  implementation  of  threshold  decoding 
with  convolutional  codes.  Finally,  in  Section  V,  data  on  error  probability  for  thres¬ 
hold  decoding  of  convolutional  codes  were  presented  for  the  binary  symmetric  channel, 
the  binary  erasure  channel,  and  the  Gaussian  channel.  This  concluded  the  treatment 
of  convolutional  codes. 

The  application  of  threshold  decoding  to  block  linear  codes  was  studied  in 
Sections  VI  and  VII.  In  Section  VI  we  saw  that  the  basic  threshold  decoding  algo¬ 
rithms  could  be  applied  efficiently  only  to  a  small  number  of  interesting  block  codes. 
To  obviate  this  difficulty,  a  more  general  procedure  for  forming  sets  of  orthogonal 
parity  checks  was  introduced  in  Section  VII.  This  generalization  permitted  efficient 
decoding  of  a  somewhat  larger  class  of  block  codes  by  the  iterating  of  threshold¬ 
decoding  decisions. 

8.2  CONCLUSIONS 

Based  on  the  research  reported  in  this  report,  the  following  conclusions  with  regard 
to  threshold  decoding  can  be  stated: 
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a.  Convolutional  Codes 

(i)  Threshold  decoding  is  easily  instrumented  (when  applicable) 

The  decoding  circuit  need  contain  only  n^-nQ  stages  of  shift  register,  which  appears 
to  be  the  minimum  one  can  hope  to  achieve  in  a  reasonable  decoding  network.  More¬ 
over,  the  only  nonlinear  elements  in  the  decoding  circuit  are  of  a  very  simple  type, 
namely  threshold  logical  elements.  Even  the  necessary  circuitry  to  compute  the  time- 
variant  weighting  factors  (when  applicable)  for  the  threshold  elements  is  not  prohibi¬ 
tively  complex. 

(ii)  Decoding  performance  is  satisfactory  for  codes  up  to  100  bits  in  length 

For  very  low  rate  codes  (cf.  secs.  3.6  and  3.7)  good  error-correcting  performance 
can  be  obtained  for  long  codes.  However,  over  the  range  of  rates  of  most  practical 
interest  at  present,  good  error -correcting  performance  is  limited  to  codes  of  approx¬ 
imately  100  transmitted  bits  in  length.  For  such  codes,  the  error  probabilities  that 
can  be  attained  by  threshold  decoding  are  competitive  with  those  obtained  by  other  known 
error-correcting  means. 

(iii)  The  error  probability  cannot  be  made  arbitrarily  small  at  the  receiver  for  a 
fixed  rate  of  data  transmission 

This  negative  result  establishes  the  lack  of  generality  of  threshold  decoding  for 

convolutional  codes.  We  were  able  to  show  rigorously  that  this  negative  result  obtains 

1 

for  the  case  R  =  y,  but  it  seems  plausible  that  it  also  applies  to  all  other  rates.  In 
particular,  for  both  the  binary  symmetric  channel  and  the  binary  erasure  channel,  we 
have  demonstrated  the  impossibility  of  making  the  decoding  error  probability  arbitrarily 
small  when  R  =  -g . 

b.  Block  Codes 

(i)  Threshold  decoding  is  easily  instrumented  (when  applicable) 

No  more  than  k  threshold  logical  elements  are  required  in  the  complete  decoding 
circuit  for  a  jblock  (n,  k)  code.  The  remainder  of  the  decoding  circuit  contains  a 
modest  amount  of  linear  components,  namely  a  replica  of  the  encoding  circuit  and 
no  more  than  (k)(d)  modulo-two  adders,  where  d  is  the  minimum  distance  of  the 
code.  In  the  important  special  case  for  cyclic  codes,  even  simpler  decoding  circuits 
are  possible. 

(ii)  Several  interesting  classes  of  codes  can  be  efficiently  decoded 

The  maximal-length  codes,  the  first-order  Reed-Muller  codes,  and  the  Hamming 
codes  can  all  be  L-step  orthogonalized.  This  means  that  for  these  codes  any  error 
pattern  of  weight  or  less  is  correctable  by  majority  decoding.  It  was  shown  in 

Sections  VI  and  VII  that  several  isolated,  but  interesting,  codes  could  also  be  L-step 
orthogonalized . 
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c.  General  Remark 


(i)  Threshold  decoding  is  limited  primarily  to  binary  codes 

In  section  6.  2  we  saw  that  the  manner  in  which  orthogonal  parity  checks  are  formed 
from  the  original  parity  checks  of  the  code  is  such  that  full  advantage  cannot  be  taken  of 
the  higher-order,  or  nonbinary,  alphabets. 

(ii)  Error-correction  is  not  completely  restricted  by  the  minimum  distance  of  the 
code 

In  general,  the  threshold-decoding  algorithms  permit  the  correction  of  many  error 
patterns  of  weight  greater  than  .  This  is  a  very  important  feature  in  low-rate 

codes. 

(iii)  Error-correction  can  be  improved  when  the  a  posteriori  error  probabilities  of 
the  received  bits  are  known 

In  contrast  to  purely  algebraic  decoding  procedures,  the  APP  decoding  rule  makes 

use  of  the  a  posteriori  probabilities  of  the  received  bits  to  decrease  the  probability  of  a 

decoding  error.  This  feature  of  threshold  decoding  can  be  expected  to  be  of  special 

importance  when  decoding  for  a  fading  channel.  This  feature  of  threshold  decoding  is 

similar  to  that  for  Gallager' s  low-density  parity-check  decoding,  which  is  also  a  com- 

8 

posite  of  algebraic  and  probabilistic  techniques. 

8.3  RECOMMENDATIONS  FOR  FURTHER  RESEARCH 

Further  research  in  continuation  of  that  reported  here  is  especially  recommended 
in  the  following  areas: 

(i)  Construction  of  good  systematic  classes  of  convolutional  codes  for  random-error 
correction 

40 

Aside  from  certain  classes  of  codes  constructed  for  burst-error  correction,  the 
only  systematic  classes  of  convolutional  codes  known  are  the  uniform  codes  and  the 
Reed-Muller-like  codes  that  were  found  in  the  research  for  this  report.  Unfortunately, 
these  are  both  classes  of  low-rate  codes.  It  would  be  desirable  to  have  several  classes 
of  good  codes  that  might  be  used  with  threshold  decoding  (or  other  decoding  algorithms). 

(ii)  Investigation  of  error-propagation  with  convolutional  codes 

It  does  not  seem  worth  while  to  expend  additional  effort  on  the  study  of  the  general 
properties  of  threshold  decoding  with  convolutional  codes.  The  essential  features  are 
quite  clear  from  Theorems  10-12.  Similarly,  it  does  not  seem  likely  that  simpler 
decoding  circuits  can  be  found.  On  the  other  hand,  certain  other  features  warrant 
additional  research,  and  chief  among  these  is  the  question  of  error  propagation. 

Two  methods  of  controlling  error  propagation,  resynchronization  and  error  counting, 
were  discussed  in  section  3.1.  There  is  also  a  third  possibility,  automatic  recovery  by 


108 


the  decoder  after  a  short  burst  of  erroneously  decoded  symbols.  This  automatic  recov¬ 
ery  seems  to  be  possible  with  sequential  decoding,  at  least  at  high  rates.  However, 

the  large  increase  in  decoding  computations  which  accompanies  a  decoding  error  with 
sequential  decoding  limits  the  practical  use  of  automatic  recovery  to  combat  error  pro¬ 
pagation.  However,  threshold  decoding  does  not  share  this  limitation,  since  the  com¬ 
putation  effort  is  always  fixed.  It  seems  advisable  to  make  a  simulation  study  by  using 
a  fairly  long  code  (e.g.,  the  n^  =  104,  R  =-g-  code  of  Table  II)  to  determine  whether 
automatic  recovery  from  an  error  is  possible.  If  so,  then  an  analytic  effort  would  be 
in  order  to  prove  that  the  burst  of  erroneous  symbols  has  some  small  average  length. 

(iii)  Study  of  other  nonlinear  functions  for  use  in  decoding  convolutional  codes 

Any  combinatorial  element  which  can  form  the  decoded  estimate  of  can  be  used 
as  the  decision  element  in  the  Type  I  decoder  of  Fig.  11,  by  replacing  the  threshold 
element  and  its  associated  adders.  It  might  be  possible  to  find  other  simple  nonlinear 
functions  that  could  be  used  to  decode  certain  classes  of  convolutional  codes  efficiently, 
but  this  will  doubtless  be  a  very  difficult  area  for  research. 

(iv)  Investigation  of  the  generality  of  L-step  orthogonalization  for  block  linear  codes 

This  is  the  most  important  area  for  additional  research.  Theorems  21  and  23  sug¬ 
gest  the  possibility  of  a  general  theorem  of  the  nature  that  "  any  binary  code  of  length 
M 

2  -1  or  less  can  be  (M-l)-step  orthogonalized."  Such  a  result,  if  it  were  true,  would 
be  extremely  important.  It  would  mean  that  any  block  code  could  be  decoded  at  least  up 
to  its  minimum  distance,  that  is,  correction  of  any  error  pattern  of  weight  1  or  less, 
in  one  operation  by  a  combinatorial  circuit  consisting  of  no  more  than  k  threshold  ele¬ 
ments,  an  encoder,  and  no  more  than  (k)(d)  modulo-two  adders.  Research  to  prove  this 
conjecture,  or  to  provide  a  counterexample,  could  be  of  considerable  value. 
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APPENDIX  A 

BASIC  DEFINITIONS  AND  PROPERTIES  OF  MODERN  ALGEBRA 


An  Abelian  group  is  a  set  of  elements  and  a  rule  of  composition  (which  we  write 
here  as  addition)  such  that  for  any  three  elements  a,  b,  and  c  in  the  set  the  following 
axioms  are  satisfied:  (1)  a  +  b  is  in  the  set.  (2)  (a+b)  +  c  =  a  +  (b+c).  (3)  There  is 
a  unique  element,  0,  in  the  set  which  is  such  that  a  +  0  =  a.  (4)  There  is  a  unique  ele¬ 
ment,  -a,  in  the  set  which  is  such  that  a  +  (-a)  =  0.  (5)  a  +  b  =  b  +  a.  We  shall  use 
the  term  "group"  to  mean  always  an  Abelian  group. 

A  subgroup  is  a  subset  of  the  elements  of  a  group  which  itself  forms  a  group  with 
respect  to  the  same  rule  of  composition. 

If  H  is  a  subgroup  of  a  group  G,  and  if  a  is  any  element  of  G,  then  the  coset  con¬ 
taining  a  modulo  H  is  the  set  of  all  elements  b  of  G  so  that  a  -  b  is  in  the  subgroup  H. 
The  coset  containing  0  is  the  subgroup  itself.  Any  other  coset  will  be  called  a  proper 
coset. 

A  ring  is  an  additive  Abelian  group  for  which  a  second  rule  of  composition  (which 
we  write  as  multiplication)  is  defined  in  such  a  way  that  for  any  elements  a,  b,  and  c 
of  the  group,  the  following  axioms  are  satisfied:  (1)  ab  is  in  the  group.  (2)  a(bc)  = 

(ab)c .  (3)  a(b+c)  =  ab  +  ac.  (4)  (b+c)a  =  ba  +  ca.  The  ring  is  commutative  if  ab  =  ba. 

A  field  is  a  ring  in  which  the  nonzero  elements  form  an  Abelian  group  with  respect 
to  multiplication.  For  q  =  pk,  where  p  is  any  prime  number,  it  can  be  shown  that  there 
exists  a  field  containing  q  elements .  This  field  is  called  the  Galois  field  of  q  elements 
and  is  denoted  by  GF(q) . 

The  set  of  all  polynomials  in  a  single  indeterminant,  D,  and  with  coefficients  in 
GF(q)  forms  a  commutative  ring  called  the  ring  of  polynomials  with  coefficients  inGF(q). 

The  ideal  generated  by  a  polynomial  f(D)  is  the  set  of  all  polynomials  of  the  form 
h(D)f(D),  where  h(D)  is  any  polynomial. 

The  residue  class  containing  g(D)  modulo  the  ideal  generated  by  f(D)  is  the  set  of 
all  polynomials  h(D)  such  that  g(D)  -  h(D)  is  in  the  ideal. 

The  set  of  residue  classes  modulo  the  ideal  generated  by  f(D)  form  a  ring  called  the 
residue-class  ring  modulo  the  ideal.  (This  property  implies  that  the  sum,  or  product, 
of  polynomials  from  two  residue  classes  is  always  in  the  same  third  residue  class  re¬ 
gardless  of  the  particular  choice  of  those  polynomials. 

(The  definitions  and  properties  given  here  can  be  found  in  any  text  on  modern  algebra 
such  as  Garrett  Birkhoff  and  Saunders  Mac  Lane,  A  Survey  of  Modern  Algebra, 
Macmillan  Company,  New  York,  1941.) 
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APPENDIX  B 

PROOF  OF  THEOREM  8  (RANDOM -CODING  BOUND) 


A  group  partition  of  an  (n,k)  code  was  defined  in  section  2.5  as  a  mapping  of  the  code 


Cj^  corresponding  to  a 


words  into  a  subgroup  H'  and  its  proper  cosets  Cl,  CL, 

k  1  " 

mapping  of  the  set  of  2  information  sequences  into  a  subgroup  H  and  its  proper  cossets 

We  assume  the  existence  of  an  ensemble  of  (n,  k)  codes  in  which  each 


cr  c2. 


■'N* 


-(n-k) 


of  being  assigned  any 


information  sequence  in  a  proper  coset  has  probability  2 
n*~lc 

of  the  2  possible  parity  sequences . 

Let  L  be  any  information  sequence .  A  code  word  with  this  information  sequence 
will  be  denoted  II.  Suppose  that  a  particular  code  is  used  over  a  binary  symmetric 
channel  with  transition  probability  PQ  <  l/2.  Then,  given  a  received  sequence  R',  the 
probability  P(Il|  R')  that  II  was  transmitted  is 

P(I'.|  R')  -  pW  qn-W, 

'  3 1  '  *o  m  ' 


(B — 1 ) 

The  maximum  likelihood 


where  w  is  the  number  of  positions  in  which  I!  and  R1  differ, 
rule  is  to  choose  that  L  as  the  transmitted  information  sequence  for  which  the  probability 
P(Il|  R1)  is  greatest.  Since  the  probability  in  Eq.  B-l  decreases  monotonically  with  w, 
the  maximum  likelihood  rule  reduces  to  choosing  that  L  as  the  transmitted  sequence 
for  which  II  and  R1  differ  in  the  fewest  positions . 

We  shall  now  calculate  P(e),  the  average  probability  of  error  in  deciding  to  which 
coset  the  transmitted  information  sequence  belongs,  with  maximum  likelihood  decoding 
assumed.  Because  the  set  of  distances  from  any  code  word  to  the  code  words  in  all  other 
cosets  is  just  equal  to  the  set  of  weights  of  the  code  words  in  the  proper  cosets,  without 
loss  of  generality  we  may  assume  that  the  all-zero  sequence,  1^,  was  transmitted. 

Let  L  be  any  information  sequence  in  a  proper  cose  t,  and  assume  that  a  noise  pattern 
of  weight  W'  has  occurred  on  the  channel .  Let  Wj  be  the  number  of  information  positions 
in  the  noise  pattern  that  differ  from  L,  and  let  Wl  be  the  total  number  of  positions  in 
which  II  differs  from  the  noise  pattern.  Since  the  all-zero  sequence  was  transmitted, 
the  received  sequence  differs  from  the  transmitted  sequence  in  W1  positions .  Over 
the  ensemble  of  codes,  the  probability  P(Wl5j.W)  that  II  is  closer  to  the  received  se¬ 
quence  than  is  1^  is  given  by 


P(WlsW')  =  2' 


W'-W. 

-(n-k)  ^  (  n7k  ) 

i=0 


(B-2) 


as  can  be  seen  from  the  fact  that  the  summation  gives  the  total  number  of  parity  se¬ 
quences  that,  when  attached  to  I.,  form  code  words  differing  in  W'  or  fewer  positions 
from  the  noise  pattern,  and  each  of  these  parity  sequences  has  probability  2  '  'by 
assumption. 

The  probability  of  error  P(e  |  W1),  given  that  the  noise  pattern  has  weight  W1,  is  the 
probability  that  some  code  word  in  a  proper  coset  has  Wl  less  than  or  equal  to  W1,  and 
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this  probability  is  overbounded  by  the  sum  of  the  probabilities  that  each  Wt  is  less  than 
or  equal  to  W1. 


W'-W. 
3 


P(e|W')§  ^  2~(n_k)  Y  (n^k)- 


all  W^  in  the 
proper  cosets 


i=0 


(B-3) 


The  right-hand  side  is  overbounded  by  summing  over  all  possible  W., 


P(e  W') 


k 

£  V  (  k\ 

-  L  W. 
W.=0V 
3 


W'-W. 
3 


W' 


2-(n-k)  ^  n-k  ^  _  2~(n-k)  ^  ^  n  ^ 

i=0  i=0 


(B-4) 


Also,  P(e|W’)  is  always  overbounded  by  unity,  since  it  is  a  probability.  Thus,  the 
average  probability  of  error  P(e),  which  is  given  by 


P(e)=  Y  p<W’)P<e|  W>)  =  Y  (w.)  Po’<*S"W'p( 


e  W«), 


all  W' 


W'  =  0 


(B-5) 


can  be  overbounded  as 
n 

P(e)  ^  Y 
W'  =  0 


(  n  ' 
\  W'  / 

\  W'  n-W' 

'  po  qo 

min. 

W'  l" 

1.  2-,n-k>  l  (?) 

i 

i=0  J 

(B-6) 


Inequality  (B-6)  is  the  usual  form  of  the  random -coding  bound. 

13  4 

Several  authors  ’  ’  have  derived  asymptotic  bounds  on  (B-6).  The  tightest  puch 

bounds  are  those  given  by  Wozencraft,  ^  and  we  shall  now  state  his  results  without  proof. 

The  bounds  are  stated  in  terms  of  the  parameters  p,,  p  and  R  ...  where 

t  cnt  crit 


R  =  1  -  H(pt), 

(B-7) 

pcrit  a  /  po 

1  ”  pcrit  ~  V^Po' 

(B-8) 

Rcrit  =  1  “  H^pcrit^ ' 

(B-9) 

With  this  notation,  (B-6)  cari  be  expressed  in  asymptotic  form  as 

-nE, 


n®crit  ““"t 

P(e)<Acrit2  +At2 


R  <  R 


crit’ 


(B-10) 


-nE., 


P(e)  <  (Ar+At)2 


Rcrit=R<C  =  1~H(po>’ 


(B-l  1) 
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where 


Ecrit  =  H(P0)  +  H(Pt>  "  2H(pcrit)  +  (pcrirpo)  lo*Z 
Et  =  H(Po)  "  H(pt)  +  (pt"Po)  lo®2  ^q/Pq)' 

s/Sirnp^q^.  Pt  Pq 

A  .  -  L  ..  P^„_ 

Cr^  1  -  (pt/qt)  2lTPeritqcrit 

and 

A"  "  1  T-(qo/po)(Pt/qt)2  ' 


(B-12) 
(B-l  3) 
(B-14) 

(B-15) 

(B-16) 


The  first  derivation  of  asymptotic  forms  for  (B-6)  was  given  by  EliasJ  who  showed 
that  the  exponent  in  (B-ll)  was  the  same  as  the  exponent  in  the  lower  bound  on  P(e)  ob¬ 
tained  by  "sphere -packing"  arguments.  This  exponent  can  be  obtained  geometrically 
by  the  construction  shown  in  Fig.  B-l .  In  this  figure,  Pcrit  is  that  value  for  which  the 


Fig.  B-l.  Geometric  interpretation  of  exponent  in  expression  for  P(e). 


3 

tangent  to  the  entropy  curve  has  a  slope  that  is  one-half  that  at  pQ.  Elias  credits 
Shannon  with  this  interesting  geometric  interpretation  of  the  exponent  in  the  expression 
for  P(e) . 


f 


i 

l 

I 
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APPENDIX  C 


PROOF  OF  THEOREMS  10  AND  11 

1.  Proof  of  Theorem  10 

Consider  the  parity-check  matrix  [H^:l]  of  a  rate  R  =  l/nQ  convolutional  code. 
consists  of  a  column  of  nQ  -  1  parity  triangles  each  of  which  corresponds  to  one  of  the 
code -generating  polynomials.  We  have 


g, 


(2) 

(2)  (2) 


8,  8, 


0 


A2)  (2)  (2) 

gm  81  8o 


(n  ) 
8o  ° 


B(n  )  (n  ) 
h  0  8o  ° 


0 


(n  )  (n  )  (n  ) 

8m  °  81  °  80  ° 


(C-l) 


(2)  (2) 

Suppose,  for  example,  that  g  ,  g  ,  g  are  all  of  the  nonzero  coefficients 

i2)  °1  az  “n 

of  G'^'iD)  in  increasing  order.  Then  from  Eq.  C-l  it  can  be  seen  that  the  parity  checks 

(1)  (2)  (2)  (2) 

in  the  first  parity  triangle  which  check  on  e'  are  s  ,  s  ,  . . .  s  '  and  these  parity 

o  cij  az  aN 

checks  check  on  the  following  noise  bits: 


<2) 

checks  on 

e*1*,  e  <2> 

1 

o  «x 

(2) 

checks  on 

e*1*,  e  ,  e  <2> 

2 

o  °2_ai  az 

(2) 

checks  on 

e^  .... 

N 

0  cN"aN-l 

(1) 


e  ,  e 

aN'El  a 


<2)_ 

N 


(C-2) 


The  manner  in  which  the  information  noise  bits  enter  into  these  parity  checks  can  be 
interpreted  geometrically  as  shown  in  Fig.  C-l.  In  Fig.  C-l  a.,  a0,  and  a,  represent 

(2)  l  c.  5 

the  first  three  nonzero  terms  in  G'  (D).  X  and  Y  correspond  to  the  information 
positions,  excluding  the  first  position  which  corresponds  to  e^1  \  that  are  checked  by 
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(2) 

s  '  '.  We  observe  that  by  choosing  a,  sufficiently  large  the  noise  bit  corresponding 
3  5 

to  X  can  be  made  to  have  as  large  a  subscript  as  desired,  and  all  of  the  other  noise 
bits  checked  will  have  larger  subscripts. 


\ 

°2  Qr— 


\ 


\ 


\ 


Fig.  C-l.  Geometric  interpretation  of 
parity -check  structure. 


\ 


\ 


or - - 

\  X  \  Y  V 


From  (C-2),  we  see  that  the  l  parity  check  on  e^  in  each  parity  triangle  checks 

( i  \  ° 

a  total  of  i  noise  bits  with  e'  '  excluded;  thus  a  total  of 

o 


N 


i=l 


N(N+1) 


(C-3) 


,0) 


noise  bits  are  checked  by  the  parity  checks  on  '  in  a  parity  triangle  corresponding 
to  a  code -generating  polynomial  with  N  nonzero  coefficients. 

The  construction  of  J  parity  checks  orthogonal  on  e^  can  proceed  in  the  following 

(2)  (31  (°„> 

manner.  One  nonzero  term  at  a  time  is  placed  into  G'  '(D),  G'  '(D),  . . .  G  (D),  in 
that  order,  until  a  total  of  J  nonzero  terms  have  been  so  placed.  The  j  term  is  placed 
into  G(i)(D)  in  such  a  way  that  the  lowest  order  information  noise  bit  checked  by  s 

(1)  j 

namely  e  ,  has  subscript  one  greater  than  any  information  noise  bit  that  is  checked 

J  j-1  / i \ 

by  any  of  the  parity  checks  already  formed  that  check  e'  '.  This  can  always  be  done  as 

°  (1) 
explained  above.  In  this  manner,  the  set  of  J  parity  checks  formed  that  check  on  e'  ' 


constitute  a  set  of  J  parity  checks  orthogonal  on  e 


(1) 


The  effective  constraint  length,  n^,  is  one  plus  the  number  of  noise  bits  with  e'Q 


(1) 


excluded  that  are  checked  by  these  parity  checks  on  e 


U) 


Let 


J  =  Q(nQ-l)  +  r 


0  <  r  <  n  -  1. 
o 


(C-4) 


Then  there  are  Q  +  1  parity  checks  on  e^1  ^  in  each  of  the  first  r  parity  triangles,  after 
our  construction,  and  Q  in  each  of  the  last  n  -  1  -  r  parity  triangles.  Thus,  using 
(C-3),  we  find  that  the  effective  constraint  length  is  given  by 


nE  -  1  =  (r) 


(Q+1KQ+2)  (QKQ+l) 

- 5 - +  (n  -1-r)  - 5 - . 


(C-5) 


Finally,  using  (C-4)  and  the  fact  that  R  =  l/no,  we  can  reduce  (C-5)  to 
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„  _ L  r2  S  ,  1  ,  ,  ,  .  1  !"i  R  1 

nE  2  J  l-R+2J+1+2  r[1_r  l-Rj'  (C-6) 

which  is  the  statement  of  the  theorem. 

2.  Proof  of  Theorem  11 

For  the  case  R  =  k0/nQ,  we  seek  a  construction  that  will  produce  J  parity  checks 
orthogonal  on  for  j  =  1,  2,  . . .  kQ.  For  this  case,  the  parity-check  matrix 
is  such  that  consists  of  an  (nQ-ko)xko  array  of  parity  triangles,  one  for  each;  of  the 
code -generating  polynomials,  as  shown  in  Fig.  C-2.  The  construction  technique  is  very 


Fig.  C-2.  Arrangement  of  parity  triangles  in  H . . 

similar  to  that  described  above.  Again,  we  shall  place  one  nonzero  term  at  a  time  into 
each  of  the  code -generating  polynomials,  following  the  order  shown  in  Fig.  C-2,  until 
J  parity  checks  on  e^  have  been  formed  for  j  =  1,  2,  .  . .  kQ.  The  manner  of  choosing 
terms  will  be  such  that  each  of  these  sets  of  J  parity  checks  is  orthogonal  on  for 
j=l,  2,...  ko. 

The  manner  in  which  the  information  noise  bits  enter  into  a  parity  check  on  e^  can 
be  shown  geometrically  as  in  Fig.  C-3.  To  be  specific,  we  have  drawn  Fig.  C-3  to  cor- 


Fig.  C-3.  Geometric  interpretation  of  parity-check  structure. 

respond  to  the  case  in  which  the  second  nonzero  term  is  placed  in  the  third  parity  tri¬ 
angle  in  some  row  of  Fig.  C-2  for  a  code  with  kQ  =  4.  This  forms  a  parity  check  on 
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and  the  information  noise  bits  corresponding  to  positions  Xj,  Yj,  Y^,  X^  and 
X In  addition,  a  single  parity  noise  bit  is  also  checked,  namely  sa^  '  Clearly,  by 

choosing  a ^  large  enough,  all  of  the  information  noise  bits  checked  by  this  parity  check, 
with  e*3*  excluded,  can  be  made  to  have  subscripts  greater  than  any  that  appears  in  the 
parity  checks  already  formed  on  e^.  Thus,  by  proceeding  in  this  way,  the  J  parity 

checks  formed  on  e^  for  j  =  1,  2,  , . .  k  are  orthogonal  on  e^*.  Clearly,  from  Fig.  C-3, 
°  (k  ) 

the  set  of  parity  checks  on  eQ  will  check  the  greatest  number  of  other  noise  bits.  By 
definition,  nE  is  one  plus  this  number. 


If  we  let 

J  =  Q(no-ko)  +  r 


0  ^  r  <  n 


k  , 
o 


(C-7) 


oo 


then  our  construction  places  Q  +  1  parity  checks  on  eQ  in  each  of  the  first  r  rows 
of  parity  triangles,  and  Q  parity  checks  on  eQ 


<ko> 


in  each  of  the  last  nQ  -  kQ  -  r  rows. 


(K ) 


From  Fig.  C-3,  we  see  that  the  l  parity  check  on  e  °  in  any  parity  triangle  will  check 

(k  ) 

exactly  k  i  noise  bits,  exclusive  of  eQ  .  Thus,  using  Eq.  C-3,  we  find  that 


nE  "  1  =  ko 


(Q+1XQ+2) 

r - ^ - +  <no-ko-r) 


Q(Q+1) 


2  1  '“o  "o  "  2 

Finally,  using  the  fact  that  R  =  kQ/no  and  using  Eq.  C-7,  we  can  reduce  Eq.  C-8  to 


(C-8) 


i.  t2  R 
2  J  1  -  R 


ko  2  +  1  +  2  r  Vr  1-R_ 


(C-9) 


which  is  the  statement  of  the  theorem. 
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APPENDIX  D 


PROOF  OF  THEOREM  21 


1.  Preliminary  Considerations 

It  will  be  convenient  to  represent  the  parity -check  matrix,  H  =  [  P:  I ) ,  of  a  block 
(n,  k)  code  in  the  following  form: 

P=  [h1h2...hk],  (D-l ) 


where  hj,  h,,, 


.  are  column  vectors  of  dimension  n 


k. 


to  denote  the  weight  of  in,  and  we  write  hr 
the  components  of  hr  and  hr  mod -two. 


We  use  the  notation  II  hr  I 


hr  to  denote  the  vector  formed  by  adding 


The  two  following  lemmas  will  be  needed  in  the  proof  of  the  main  theorem. 
LEMMA  D.  1:  Given  a  block  (n,  k)  code  with  minimum  distance  d,  then 


K 


h  © 


»d  -  N 


N 


(D-2) 


for  any  distinct  indices  Oj,  a 2>  ...  in  the  set  1,2,  ...  k. 

This  is  a  well-known  result  and  states  simply  that  the  sum  of  any  N  columns  of  P 
must  have  weight  at  least  d  -  N.  The  proof  is  very  similar  to  that  used  for  the  corre¬ 
sponding  result  with  convolutional  codes  (cf.  proof  of  Theorem  11)  and  will  be  omitted 
here. 

We  shall  write  hjh2  ...  hr  to  mean  the  vector  inner  product  of  hj,  h.,,  ...  h.,  that 

is,  the  vector  whose  component  is  the  product  of  the  j**1  components  of  hj,  h2>  .  . .  hr. 

With  this  notation,  we  have  the  following  lemma. 

LEMMA  D.  2: 

N 

|| h!©  h2©  hN||  =  £  (-2>j_1  £  «h  h  ha  ||.  (D-3) 

3=1  all  distinct  ^ 

sets  of  indices 

a  .<«_<... o. “ 

1  2  3 

PROOF  D.  2:  Since  this  result  does  not  appear  to  be  generally  known,  we  shall  give 
a  proof.  Clearly,  it  suffices  to  prove  that  (D-3)  holds  for  the  mod-two  sum  of  the  binary 
scalars  bj,  b2,  ...  b^. 

For  N=2,  Eq.  D-3  gives 

bj  ©  b2  =  bj  +  b2  -  2b1b2>  (D-4) 

Here,  we  use  ©  to  indicate  mod-two  addition  and  "+"  (and  S)  to  indicate  ordinary  real 
number  addition.  Equation  D-4  is  readily  verified  by  testing  all  four  possible  values 
ofbj,  b2> 

We  now  assume  that  the  lemma  holds  for  N  =  M,  and  we  wish  to  show  that  it  holds 
also  for  N  =  M  +  1.  Using  the  associative  law,  we  have 
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1 


2 


(D-5) 


®  . . .  ®  bM+1  -  (bj  ©  ...  ©  bM)  ©  bM+r 

Applying  (D-4)  to  the  two  terms  on  the  right  of  (D-5),  we  obtain 

bl  ®  •  •  •  ®  bM+l  =  (bl  ®  *  ‘  •  ®  bM^  +  bM+l  “  2bM+I  (bl  ^ 
Using  the  inductive  assumption,  we  can  write  Eq.  D-6  as 


, .  ©  b 


M 

M+l  =  Z  (~2)j  1  Y 


j=l 


distinct 

indices 


b  b  . . .  b 
B1  °2 


°j 


bM).  (D-6) 


M 


+  bM+IM-2)bM+I  Z(-2)i_1  I  b«,ba 

3=1  — —  1 


distinct 

indices 


But  Eq.  D-7  is  the  same  as 

M+l 


bl®  ®  bM+l 


■  I  X 

3=1  distinct 

indices 


(D-7) 


(D-8) 


which  is  the  statement  of  the  lemma.  By  induction,  the  lemma  is  true  for  all  M. 


2.  Proof  of  Theorem  21 

We  wish  to  prove  that  when  k  is  3  or  less,  any  block  (n,  k)  code  can  be  completely 
orthogonalized,  that  is,  d  -  1  parity  checks  orthogonal  on  can  be  formed  for 
3=1,2,  . . .  k.  We  shall  prove  this  result  by  showing  that  when  k  is  three  or  less  the 
necessary  conditions  for  minimum  distance  d  coincide  with  the  sufficient  conditions 
for  complete  orthogonalization.  (At  this  point,  we  observe  from  (D-l)  that  llhj  II  is  the 
number  of  parity  checks  that  check  e  j,  llhj  II  -  ||hjh2 ]|  is  the  number  of  parity  checks 
that  check  on  e^  but  not  on  c,^,  etc.) 

Case  1:  k  =  1,  H  =  [h^l]. 

A  sufficient  condition  for  d-l  parity  checks  orthogonal  on  e  ^  is 

||hjll>d-l.  (D-9) 

Whereas,  from  Lemma  D.  1,  a  necessary  condition  for  minimum  distance  d  is 

II hj  II  >d  -  1  (D-10) 

and  thi3  coincides  with  the  sufficient  condition  for  complete  orthogonalization. 

Case  2:  k  =  2,  H  =  [hjly.l]. 

If  e2  appears  alone  in  enough  parity  checks  to  eliminate  e2  from  all  but  at  most 
one  of  the  parity  checks  in  which  it  appears  with  ej,  then  the  case  reduces  to  the 
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previous  case.  Otherwise,  at  least  as  many  parity  checks  orthogonal  on  e^  can  be 
formed  as  there  are  checks  on  e^  alone  (that  is,  ho  other  information  noise  bits)  plus 
checks  on  e2  alone  plus  one.  Thus  a  sufficient  condition  for  d  -  1  checks  orthogonal 
on  ej  is 

(llhjll-llhjh.,1!)  +  (  |ih2ii-|lh1h2l|-l)  >d  -  1  (D-ll) 

or 

II  hj  II  +  ||h2l|  -  2  ||h1h2  ||  »d  -  2.  (D-12) 

By  symmetry,  (D-12)  is  also  a  sufficient  condition  for  d  -  1  parity  checks  orthogonal 
on  e~,  and  hence  is  a  sufficient  condition  for  complete  orthogonalization.  Applying 
Lemma  D.  1  to  the  first  two  columns  of  H  and  with  the  aid  of  Lemma  D.  2,  we  find  that 
(D-12)  is  also  a  necessary  condition  for  minimum  distance  d. 

Case  3:  k  =  3,  H  =  [h^h^l]. 

If  the  number  of  parity  checks  on  e2  and  e3  alone  is  no  greater  than  the  number 
on  ej  and  e2  and  e0,  then  this  case  reduces  to  the  previous  case.  Otherwise,  at  least 
as  many  parity  checks  orthogonal  on  ej  can  be  formed  as  there  are  checks  on  alone 
plus  checks  on  e2  alone  plus  one  plus  checks  on  e3  alone  plus  one  plus  checks  on  e  ( 
and  e2  and  e3-  This  follows  from  the  fact  that  when  the  latter  condition  is  satisfied, 
there  are  enough  checks  on  e2  and  e3  to  eliminate  these  variables  from  all  of  the 
parity  checks  on  ej  and  e2  and  e3>  and  from  the  fact  that  e2,  as  well  as  e3>  can 
appear  in  one  of  the  parity  checks  on  e^  after  orthogonalization.  Thus,  a  sufficient 
condition  for  d  -  1  parity  checks  orthogonal  on  e^  is 

(lihj  II  -  flhjhgll  -  llhjhjll  +  llhjh2h3l|)  +  ( l|h2  II  -  llhjh2  II  -  ||h2h3  ||  +  ||hjh2h3  II  +  1) 

+  ( ||h3 H  -  Hhj^  II  -  Ilh2h3 II  +  llhjh2h3 1|  +  1)  +  llhJh2h3H  sd-  1  (D-13) 

or 

II  hj  II  +  ||h2l|  +  ||h3  II  -  2(||hjh2l|  +  lihjhg  ll  +  II  h2h3  || )  +  4  ||h,h2h3ll  &  d  -  3. 

(D-14) 

Again,  by  symmetry,  (D-14)  is  also  a  sufficient  condition  for  d  -  1  parity  checks 
orthogonal  on  e2  or  e3<  and  hence  is  a  sufficient  condition  for  complete  orthogonal¬ 
ization.  Applying  Lemmas  D.  1  and  D.  2  to  the  first  three  columns  of  H,  we  find  that 
(D-14)  is  also  a  necessary  condition  for  minimum  distance  d. 

This  proves  the  theorem. 
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